De-flake test_resample_merge_system: assert samples, not message count by cboulay · Pull Request #165 · ezmsg-org/ezmsg-sigproc

cboulay · 2026-07-02T01:14:40Z

Summary

test_resample_merge_healthy_system intermittently fails on Windows CI (e.g. run 28556535265) with:

AssertionError: Healthy graph should produce ~60 merged messages, got 29.
assert 29 >= 50

The graph terminates normally under subscriber backpressure — nothing crashes. The tests asserted on quantities that a scheduling/reset race makes non-deterministic.

Why counts are unreliable

Message count is a coalescing artifact: under backpressure the same data arrives packed into a variable number of messages — measured 108 locally vs 29 on a loaded runner.
Total samples is not a safe substitute for the glitch cases: the reference-reset re-anchoring drops a timing-dependent amount of data, so the recovered run measured 1710 samples locally but 960 on CI (this is what a naive samples-based first attempt would have re-flaked on).

The robust invariant: stream progress

Assert on last_t — the stream-time of the final emitted sample, i.e. how far through the signal the output reached. It's insensitive to coalescing and to mid-stream sample drops; it only falls if the tail is truncated. Measured dead-stable across runs:

test	n_msgs (unstable)	total (unstable for resets)	last_t (stable)
healthy	108 (29 on CI)	1800	1.799 s
seize	18	450	0.449 s
recover	60	1710 (960 on CI)	0.799 s
resampleconcat	61	1740	0.799 s

Thresholds sit with wide margin between the regimes: healthy > 1.5, seize < 0.6, recover/composite > 0.6 (the seized run stops at the glitch ~0.45 s; recovered runs continue to the re-anchored end ~0.8 s). Channel-count checks kept; n_msgs > 0 retained only as a liveness sanity in the seize case.

Also

Raise the idle-gap TerminateOnTimeout from 2.0 s → 4.0 s so a transient CI stall can't open an output gap that truncates the tail — the one thing last_t is sensitive to.

Same philosophy as 2ba329d (de-flake test_decimate_system): stop asserting on a quantity a termination/scheduling race makes non-deterministic.

Verification

pytest tests/integration/ezmsg/test_resample_merge_system.py — 4 passed, stable across repeated local runs.

test_resample_merge_healthy_system intermittently failed on Windows CI ("got 29" against `n_msgs >= 50`). The captured log shows subscriber backpressure and a normal termination: the same data was delivered coalesced into far fewer messages (108 locally vs 29 on a loaded runner), so message count is a scheduling artifact. Total samples is not a reliable substitute for the glitch/reset cases: the reference-reset re-anchoring drops a timing-dependent amount of data, so the recovered run measured 1710 samples locally but 960 on CI. Assert instead on `last_t`, the stream-time of the final emitted sample -- how far through the signal the output reached. It is insensitive to both coalescing and mid-stream drops, and only falls if the tail is truncated. Measured dead-stable: healthy 1.799 s, seized 0.449 s, recovered/composite 0.799 s. Thresholds (healthy > 1.5, seized < 0.6, recovered > 0.6) sit with wide margin between the regimes. Also raise the idle-gap TerminateOnTimeout from 2.0 s to 4.0 s so a transient CI stall cannot open an output gap that truncates the tail (which is the one thing `last_t` is sensitive to).

cboulay force-pushed the deflake/resample-merge-system-msgcount branch from aa41fba to 23ca1a9 Compare July 2, 2026 01:43

cboulay merged commit 170291b into dev Jul 2, 2026
25 of 26 checks passed

cboulay deleted the deflake/resample-merge-system-msgcount branch July 2, 2026 02:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

De-flake test_resample_merge_system: assert samples, not message count#165

De-flake test_resample_merge_system: assert samples, not message count#165
cboulay merged 1 commit into
devfrom
deflake/resample-merge-system-msgcount

cboulay commented Jul 2, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

cboulay commented Jul 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why counts are unreliable

The robust invariant: stream progress

Also

Verification

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

cboulay commented Jul 2, 2026 •

edited

Loading