Skip to content

fix: stabilise PersistentActorRecoveryTimeoutSpec#2906

Open
He-Pin wants to merge 1 commit intoapache:mainfrom
He-Pin:fix-persistent-actor-recovery-timeout-flake
Open

fix: stabilise PersistentActorRecoveryTimeoutSpec#2906
He-Pin wants to merge 1 commit intoapache:mainfrom
He-Pin:fix-persistent-actor-recovery-timeout-flake

Conversation

@He-Pin
Copy link
Copy Markdown
Member

@He-Pin He-Pin commented Apr 25, 2026

Motivation:
PersistentActorRecoveryTimeoutSpec can flake under CI load when the receive-timeout test reuses the same 3s stepped recovery timeout path as the test that intentionally times out recovery. If the second replay operation is delayed, the recovery tick can win and SteppingInmemJournal.step waits until its ask timeout.

Modification:

  • Let SteppingInmemJournal read instance-id from the plugin config passed to the journal actor, while preserving the no-arg fallback for existing tests.
  • Give the receive-timeout scenario its own stepping journal instance with a wider recovery timeout.
  • Consume the first RecoveryCompleted signal and release both replay tokens up front so the assertion verifies successful recovery rather than racing the recovery timeout.

Result:
The timeout failure test still exercises the 3s recovery timeout, and the receive-timeout test now checks that successful recovery preserves the actor receive timeout without depending on that 3s timeout window.

Tests:

  • sbt "persistence / Test / testOnly org.apache.pekko.persistence.PersistentActorRecoveryTimeoutSpec"
  • sbt -Dpekko.test.timefactor=2 "persistence / Test / testOnly org.apache.pekko.persistence.PersistentActorRecoveryTimeoutSpec"
  • sbt -Dpekko.test.timefactor=2 "persistence-typed-tests / Test / testOnly org.apache.pekko.persistence.typed.scaladsl.EventSourcedBehaviorRecoveryTimeoutSpec org.apache.pekko.persistence.typed.scaladsl.EventSourcedStashOverflowSpec"
  • sbt -Dpekko.test.timefactor=2 "persistence / Test / testOnly org.apache.pekko.persistence.SteppingInMemPersistentActorStashingSpec"
  • sbt -Dpekko.test.timefactor=2 "persistence / Test / testOnly org.apache.pekko.persistence.ThrowExceptionStrategyPersistentActorBoundedStashingSpec"
  • sbt -Dpekko.test.timefactor=2 "persistence / Test / testOnly org.apache.pekko.persistence.DiscardStrategyPersistentActorBoundedStashingSpec"
  • sbt -Dpekko.test.timefactor=2 "persistence / Test / testOnly org.apache.pekko.persistence.ReplyToStrategyPersistentActorBoundedStashingSpec"
  • sbt "persistence / Test / scalafmtCheck"

Motivation:
PersistentActorRecoveryTimeoutSpec can flake under CI load when the receive-timeout test reuses the same 3s recovery timeout path as the test that intentionally times out recovery. The second stepped replay operation can race the recovery timeout and leave SteppingInmemJournal.step waiting until its ask timeout.

Modification:
Allow SteppingInmemJournal to read instance-id from the plugin config passed to the test journal actor, while keeping the no-arg fallback for existing tests. Use a separate stepping journal instance with a wider recovery timeout for the receive-timeout scenario, consume the first RecoveryCompleted signal, and release both replay tokens up front.

Result:
The recovery-timeout failure test remains a 3s timeout check, while the receive-timeout test verifies successful recovery without racing that timeout.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant