Skip to content

De-flake test_resample_merge_system: assert samples, not message count#165

Merged
cboulay merged 1 commit into
devfrom
deflake/resample-merge-system-msgcount
Jul 2, 2026
Merged

De-flake test_resample_merge_system: assert samples, not message count#165
cboulay merged 1 commit into
devfrom
deflake/resample-merge-system-msgcount

Conversation

@cboulay

@cboulay cboulay commented Jul 2, 2026

Copy link
Copy Markdown
Member

Summary

test_resample_merge_healthy_system intermittently fails on Windows CI (e.g. run 28556535265) with:

AssertionError: Healthy graph should produce ~60 merged messages, got 29.
assert 29 >= 50

The graph terminates normally under subscriber backpressure — nothing crashes. The tests asserted on quantities that a scheduling/reset race makes non-deterministic.

Why counts are unreliable

  • Message count is a coalescing artifact: under backpressure the same data arrives packed into a variable number of messages — measured 108 locally vs 29 on a loaded runner.
  • Total samples is not a safe substitute for the glitch cases: the reference-reset re-anchoring drops a timing-dependent amount of data, so the recovered run measured 1710 samples locally but 960 on CI (this is what a naive samples-based first attempt would have re-flaked on).

The robust invariant: stream progress

Assert on last_t — the stream-time of the final emitted sample, i.e. how far through the signal the output reached. It's insensitive to coalescing and to mid-stream sample drops; it only falls if the tail is truncated. Measured dead-stable across runs:

test n_msgs (unstable) total (unstable for resets) last_t (stable)
healthy 108 (29 on CI) 1800 1.799 s
seize 18 450 0.449 s
recover 60 1710 (960 on CI) 0.799 s
resampleconcat 61 1740 0.799 s

Thresholds sit with wide margin between the regimes: healthy > 1.5, seize < 0.6, recover/composite > 0.6 (the seized run stops at the glitch ~0.45 s; recovered runs continue to the re-anchored end ~0.8 s). Channel-count checks kept; n_msgs > 0 retained only as a liveness sanity in the seize case.

Also

Raise the idle-gap TerminateOnTimeout from 2.0 s → 4.0 s so a transient CI stall can't open an output gap that truncates the tail — the one thing last_t is sensitive to.

Same philosophy as 2ba329d (de-flake test_decimate_system): stop asserting on a quantity a termination/scheduling race makes non-deterministic.

Verification

pytest tests/integration/ezmsg/test_resample_merge_system.py — 4 passed, stable across repeated local runs.

test_resample_merge_healthy_system intermittently failed on Windows CI
("got 29" against `n_msgs >= 50`). The captured log shows subscriber
backpressure and a normal termination: the same data was delivered
coalesced into far fewer messages (108 locally vs 29 on a loaded runner),
so message count is a scheduling artifact.

Total samples is not a reliable substitute for the glitch/reset cases:
the reference-reset re-anchoring drops a timing-dependent amount of data,
so the recovered run measured 1710 samples locally but 960 on CI.

Assert instead on `last_t`, the stream-time of the final emitted sample --
how far through the signal the output reached. It is insensitive to both
coalescing and mid-stream drops, and only falls if the tail is truncated.
Measured dead-stable: healthy 1.799 s, seized 0.449 s, recovered/composite
0.799 s. Thresholds (healthy > 1.5, seized < 0.6, recovered > 0.6) sit
with wide margin between the regimes.

Also raise the idle-gap TerminateOnTimeout from 2.0 s to 4.0 s so a
transient CI stall cannot open an output gap that truncates the tail
(which is the one thing `last_t` is sensitive to).
@cboulay cboulay force-pushed the deflake/resample-merge-system-msgcount branch from aa41fba to 23ca1a9 Compare July 2, 2026 01:43
@cboulay cboulay merged commit 170291b into dev Jul 2, 2026
25 of 26 checks passed
@cboulay cboulay deleted the deflake/resample-merge-system-msgcount branch July 2, 2026 02:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant