fix: stabilise PersistentActorRecoveryTimeoutSpec#2906
Open
He-Pin wants to merge 1 commit intoapache:mainfrom
Open
fix: stabilise PersistentActorRecoveryTimeoutSpec#2906He-Pin wants to merge 1 commit intoapache:mainfrom
He-Pin wants to merge 1 commit intoapache:mainfrom
Conversation
Motivation: PersistentActorRecoveryTimeoutSpec can flake under CI load when the receive-timeout test reuses the same 3s recovery timeout path as the test that intentionally times out recovery. The second stepped replay operation can race the recovery timeout and leave SteppingInmemJournal.step waiting until its ask timeout. Modification: Allow SteppingInmemJournal to read instance-id from the plugin config passed to the test journal actor, while keeping the no-arg fallback for existing tests. Use a separate stepping journal instance with a wider recovery timeout for the receive-timeout scenario, consume the first RecoveryCompleted signal, and release both replay tokens up front. Result: The recovery-timeout failure test remains a 3s timeout check, while the receive-timeout test verifies successful recovery without racing that timeout.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation:
PersistentActorRecoveryTimeoutSpec can flake under CI load when the receive-timeout test reuses the same 3s stepped recovery timeout path as the test that intentionally times out recovery. If the second replay operation is delayed, the recovery tick can win and SteppingInmemJournal.step waits until its ask timeout.
Modification:
Result:
The timeout failure test still exercises the 3s recovery timeout, and the receive-timeout test now checks that successful recovery preserves the actor receive timeout without depending on that 3s timeout window.
Tests: