From 2c690a0f7b540a64b99503c8bc99df7044fd6bb9 Mon Sep 17 00:00:00 2001 From: Charlie Le Date: Sun, 12 Apr 2026 12:00:24 -0700 Subject: [PATCH 01/23] flaky-test-audit: initialize tracking Add flaky-tests/audit-log.md to track CI runs on this branch. Any test failure here is a flaky test since no test logic has been modified from master. Signed-off-by: Charlie Le Co-Authored-By: Claude Opus 4.6 (1M context) Signed-off-by: Charlie Le --- flaky-tests/audit-log.md | 6 ++++++ 1 file changed, 6 insertions(+) create mode 100644 flaky-tests/audit-log.md diff --git a/flaky-tests/audit-log.md b/flaky-tests/audit-log.md new file mode 100644 index 0000000000..17b9c84088 --- /dev/null +++ b/flaky-tests/audit-log.md @@ -0,0 +1,6 @@ +# Flaky Test Audit Log + +This file tracks every CI run on the `flaky-test-audit` branch. Any test failure on this branch is a flaky test since no test logic has been modified from `master`. + +| Timestamp | Result | Details | CI Job | +|-----------|--------|---------|--------| From 55fcbc14a39354c78e5cff4da321195c013b23a1 Mon Sep 17 00:00:00 2001 From: Charlie Le Date: Sun, 12 Apr 2026 12:25:22 -0700 Subject: [PATCH 02/23] flaky-test-audit: TestDistributorQuerier_QueryIngestersWithinBoundary Detected flaky test in ci run 24314068155. The subtest maxT_well_after_lookback_boundary failed under -race on amd64 but passed on arm64 and without -race. Root cause is a timing sensitivity where time.Now() drifts between test setup and code under test. Signed-off-by: Charlie Le Co-Authored-By: Claude Opus 4.6 (1M context) Signed-off-by: Charlie Le --- ...torQuerier_QueryIngestersWithinBoundary.md | 30 +++++++++++++++++++ flaky-tests/audit-log.md | 1 + 2 files changed, 31 insertions(+) create mode 100644 flaky-tests/TestDistributorQuerier_QueryIngestersWithinBoundary.md diff --git a/flaky-tests/TestDistributorQuerier_QueryIngestersWithinBoundary.md b/flaky-tests/TestDistributorQuerier_QueryIngestersWithinBoundary.md new file mode 100644 index 0000000000..7699391e28 --- /dev/null +++ b/flaky-tests/TestDistributorQuerier_QueryIngestersWithinBoundary.md @@ -0,0 +1,30 @@ +# Flaky Test: TestDistributorQuerier_QueryIngestersWithinBoundary + +**Status**: Active +**Occurrences**: 1 +**Root Cause**: Timing-sensitive test. `now` is captured at test setup (line 561) via `time.Now()`, but the code under test in `newDistributorQueryable` also calls `time.Now()` internally to compute the lookback boundary. Under the race detector (slower execution), enough wall-clock time elapses between setup and execution that the internal boundary shifts. The "maxT well after lookback boundary" subtest uses only a 10-second margin (`now.Add(-lookback + 10*time.Second)`), which is not enough buffer when the race detector slows execution. The result is that `queryMaxT` falls before the internally-computed boundary, the query is skipped, and `distributor.Calls` is empty instead of having 1 call. + +## Occurrences + +### 2026-04-12T19:09:52Z +- **Job**: [ci / test (amd64)](https://github.com/cortexproject/cortex/actions/runs/24314068155) +- **Package**: `github.com/cortexproject/cortex/pkg/querier` +- **File**: `pkg/querier/distributor_queryable_test.go:638` +- **Subtest**: `maxT_well_after_lookback_boundary` +- **Notes**: Passed on arm64, passed on test-no-race (amd64), failed only under `-race` on amd64. + +
Build logs + +``` +--- FAIL: TestDistributorQuerier_QueryIngestersWithinBoundary (0.00s) + --- FAIL: TestDistributorQuerier_QueryIngestersWithinBoundary/maxT_well_after_lookback_boundary (0.01s) + distributor_queryable_test.go:638: + Error Trace: /__w/cortex/cortex/pkg/querier/distributor_queryable_test.go:638 + Error: "[]" should have 1 item(s), but has 0 + Test: TestDistributorQuerier_QueryIngestersWithinBoundary/maxT_well_after_lookback_boundary + Messages: should manipulate when maxT is well after boundary +FAIL +FAIL github.com/cortexproject/cortex/pkg/querier 34.451s +``` + +
diff --git a/flaky-tests/audit-log.md b/flaky-tests/audit-log.md index 17b9c84088..e07a813e88 100644 --- a/flaky-tests/audit-log.md +++ b/flaky-tests/audit-log.md @@ -4,3 +4,4 @@ This file tracks every CI run on the `flaky-test-audit` branch. Any test failure | Timestamp | Result | Details | CI Job | |-----------|--------|---------|--------| +| 2026-04-12T19:09:52Z | FLAKY | TestDistributorQuerier_QueryIngestersWithinBoundary/maxT_well_after_lookback_boundary (pkg/querier) | [ci run 24314068155](https://github.com/cortexproject/cortex/actions/runs/24314068155) | From 32bbfbae4263ac0daffaadeba620836c03a28d6c Mon Sep 17 00:00:00 2001 From: Charlie Le Date: Sun, 12 Apr 2026 12:47:18 -0700 Subject: [PATCH 03/23] flaky-test-audit: run #2 passed, no flaky tests CI run 24314518781 completed with all jobs passing. Signed-off-by: Charlie Le Co-Authored-By: Claude Opus 4.6 (1M context) Signed-off-by: Charlie Le --- flaky-tests/audit-log.md | 1 + 1 file changed, 1 insertion(+) diff --git a/flaky-tests/audit-log.md b/flaky-tests/audit-log.md index e07a813e88..e347dd2fed 100644 --- a/flaky-tests/audit-log.md +++ b/flaky-tests/audit-log.md @@ -5,3 +5,4 @@ This file tracks every CI run on the `flaky-test-audit` branch. Any test failure | Timestamp | Result | Details | CI Job | |-----------|--------|---------|--------| | 2026-04-12T19:09:52Z | FLAKY | TestDistributorQuerier_QueryIngestersWithinBoundary/maxT_well_after_lookback_boundary (pkg/querier) | [ci run 24314068155](https://github.com/cortexproject/cortex/actions/runs/24314068155) | +| 2026-04-12T19:47:00Z | PASS | No flaky tests detected | [ci run 24314518781](https://github.com/cortexproject/cortex/actions/runs/24314518781) | From 2d57212ed551bb15edea9c6d4d2faf4549b8c09d Mon Sep 17 00:00:00 2001 From: Charlie Le Date: Sun, 12 Apr 2026 13:25:10 -0700 Subject: [PATCH 04/23] flaky-test-audit: TestQueueConcurrency timeout on arm64 Detected flaky test in ci run 24314927948. TestQueueConcurrency in pkg/scheduler/queue timed out after 30m on arm64 with -race. Root cause is a deadlock where dequeueRequest blocks forever on a channel when the queue is drained or deleted by concurrent goroutines. Signed-off-by: Charlie Le Co-Authored-By: Claude Opus 4.6 (1M context) Signed-off-by: Charlie Le --- flaky-tests/TestQueueConcurrency.md | 31 +++++++++++++++++++++++++++++ flaky-tests/audit-log.md | 1 + 2 files changed, 32 insertions(+) create mode 100644 flaky-tests/TestQueueConcurrency.md diff --git a/flaky-tests/TestQueueConcurrency.md b/flaky-tests/TestQueueConcurrency.md new file mode 100644 index 0000000000..cb7b585ce2 --- /dev/null +++ b/flaky-tests/TestQueueConcurrency.md @@ -0,0 +1,31 @@ +# Flaky Test: TestQueueConcurrency + +**Status**: Active +**Occurrences**: 1 +**Root Cause**: Deadlock/hang in concurrent test. The test spawns 30 goroutines that concurrently enqueue, dequeue, and delete from a queue. Goroutines where `cnt%5 == 0` (and odd) call `dequeueRequest(0, false)` which blocks on a channel waiting for an item. If other goroutines have already drained the queue or called `deleteQueue`, the dequeue goroutine blocks forever — there will never be another enqueue to unblock it. This causes the entire test to hang until the 30-minute timeout. The issue manifests more on arm64 with `-race` due to slower execution and different goroutine scheduling. + +## Occurrences + +### 2026-04-12T20:23:09Z +- **Job**: [ci / test (arm64)](https://github.com/cortexproject/cortex/actions/runs/24314927948) +- **Package**: `github.com/cortexproject/cortex/pkg/scheduler/queue` +- **File**: `pkg/scheduler/queue/user_queues_test.go:461` +- **Notes**: Timed out after 30m on arm64 with `-race`. Passed on amd64 with `-race`, passed on both arches without `-race`. Goroutine dump shows `dequeueRequest` stuck waiting on channel at `user_request_queue.go:35`. + +
Build logs + +``` +panic: test timed out after 30m0s + running tests: + TestQueueConcurrency (29m51s) + +goroutine 100 [chan receive, 29 minutes]: +github.com/cortexproject/cortex/pkg/scheduler/queue.(*FIFORequestQueue).dequeueRequest(0xc000410080, 0xc8ea31?, 0x6?) + /__w/cortex/cortex/pkg/scheduler/queue/user_request_queue.go:35 +0x48 +github.com/cortexproject/cortex/pkg/scheduler/queue.TestQueueConcurrency.func1(0xf) + /__w/cortex/cortex/pkg/scheduler/queue/user_queues_test.go:477 +0x280 + +FAIL github.com/cortexproject/cortex/pkg/scheduler/queue 1800.101s +``` + +
diff --git a/flaky-tests/audit-log.md b/flaky-tests/audit-log.md index e347dd2fed..6c8e62c321 100644 --- a/flaky-tests/audit-log.md +++ b/flaky-tests/audit-log.md @@ -6,3 +6,4 @@ This file tracks every CI run on the `flaky-test-audit` branch. Any test failure |-----------|--------|---------|--------| | 2026-04-12T19:09:52Z | FLAKY | TestDistributorQuerier_QueryIngestersWithinBoundary/maxT_well_after_lookback_boundary (pkg/querier) | [ci run 24314068155](https://github.com/cortexproject/cortex/actions/runs/24314068155) | | 2026-04-12T19:47:00Z | PASS | No flaky tests detected | [ci run 24314518781](https://github.com/cortexproject/cortex/actions/runs/24314518781) | +| 2026-04-12T20:23:09Z | FLAKY | TestQueueConcurrency (pkg/scheduler/queue) — timed out after 30m on arm64 | [ci run 24314927948](https://github.com/cortexproject/cortex/actions/runs/24314927948) | From 376d067a988b5a16621b023a8e60e9a45eac67c9 Mon Sep 17 00:00:00 2001 From: Charlie Le Date: Sun, 12 Apr 2026 13:47:43 -0700 Subject: [PATCH 05/23] flaky-test-audit: TestDistributorQuerier_QueryIngestersWithinBoundary occurrence #2 CI run 24315645679: same timing-sensitive test failed again on amd64 with -race. This is occurrence #2 of 3 before auto-skip. Signed-off-by: Charlie Le Co-Authored-By: Claude Opus 4.6 (1M context) Signed-off-by: Charlie Le --- ...torQuerier_QueryIngestersWithinBoundary.md | 25 ++++++++++++++++++- flaky-tests/audit-log.md | 1 + 2 files changed, 25 insertions(+), 1 deletion(-) diff --git a/flaky-tests/TestDistributorQuerier_QueryIngestersWithinBoundary.md b/flaky-tests/TestDistributorQuerier_QueryIngestersWithinBoundary.md index 7699391e28..977f16f96b 100644 --- a/flaky-tests/TestDistributorQuerier_QueryIngestersWithinBoundary.md +++ b/flaky-tests/TestDistributorQuerier_QueryIngestersWithinBoundary.md @@ -1,7 +1,7 @@ # Flaky Test: TestDistributorQuerier_QueryIngestersWithinBoundary **Status**: Active -**Occurrences**: 1 +**Occurrences**: 2 **Root Cause**: Timing-sensitive test. `now` is captured at test setup (line 561) via `time.Now()`, but the code under test in `newDistributorQueryable` also calls `time.Now()` internally to compute the lookback boundary. Under the race detector (slower execution), enough wall-clock time elapses between setup and execution that the internal boundary shifts. The "maxT well after lookback boundary" subtest uses only a 10-second margin (`now.Add(-lookback + 10*time.Second)`), which is not enough buffer when the race detector slows execution. The result is that `queryMaxT` falls before the internally-computed boundary, the query is skipped, and `distributor.Calls` is empty instead of having 1 call. ## Occurrences @@ -28,3 +28,26 @@ FAIL github.com/cortexproject/cortex/pkg/querier 34.451s ``` + +### 2026-04-12T20:32:32Z +- **Job**: [ci / test (amd64)](https://github.com/cortexproject/cortex/actions/runs/24315645679) +- **Package**: `github.com/cortexproject/cortex/pkg/querier` +- **File**: `pkg/querier/distributor_queryable_test.go:638` +- **Subtest**: `maxT_well_after_lookback_boundary` +- **Notes**: Same failure, again on amd64 with `-race`. Passed on arm64 and test-no-race. + +
Build logs + +``` +--- FAIL: TestDistributorQuerier_QueryIngestersWithinBoundary (0.00s) + --- FAIL: TestDistributorQuerier_QueryIngestersWithinBoundary/maxT_well_after_lookback_boundary (0.00s) + distributor_queryable_test.go:638: + Error Trace: /__w/cortex/cortex/pkg/querier/distributor_queryable_test.go:638 + Error: "[]" should have 1 item(s), but has 0 + Test: TestDistributorQuerier_QueryIngestersWithinBoundary/maxT_well_after_lookback_boundary + Messages: should manipulate when maxT is well after boundary +FAIL +FAIL github.com/cortexproject/cortex/pkg/querier 33.505s +``` + +
\ No newline at end of file diff --git a/flaky-tests/audit-log.md b/flaky-tests/audit-log.md index 6c8e62c321..20008ca1b8 100644 --- a/flaky-tests/audit-log.md +++ b/flaky-tests/audit-log.md @@ -7,3 +7,4 @@ This file tracks every CI run on the `flaky-test-audit` branch. Any test failure | 2026-04-12T19:09:52Z | FLAKY | TestDistributorQuerier_QueryIngestersWithinBoundary/maxT_well_after_lookback_boundary (pkg/querier) | [ci run 24314068155](https://github.com/cortexproject/cortex/actions/runs/24314068155) | | 2026-04-12T19:47:00Z | PASS | No flaky tests detected | [ci run 24314518781](https://github.com/cortexproject/cortex/actions/runs/24314518781) | | 2026-04-12T20:23:09Z | FLAKY | TestQueueConcurrency (pkg/scheduler/queue) — timed out after 30m on arm64 | [ci run 24314927948](https://github.com/cortexproject/cortex/actions/runs/24314927948) | +| 2026-04-12T20:32:32Z | FLAKY | TestDistributorQuerier_QueryIngestersWithinBoundary/maxT_well_after_lookback_boundary (pkg/querier) — occurrence #2 | [ci run 24315645679](https://github.com/cortexproject/cortex/actions/runs/24315645679) | From edea790faea3a583fb05c5f6b69add7447400c17 Mon Sep 17 00:00:00 2001 From: Charlie Le Date: Sun, 12 Apr 2026 13:49:48 -0700 Subject: [PATCH 06/23] flaky-test-audit: skip TestDistributorQuerier_QueryIngestersWithinBoundary and TestQueueConcurrency Auto-skip flaky tests after first occurrence: - TestDistributorQuerier_QueryIngestersWithinBoundary: timing-sensitive test where time.Now() drifts between test setup and code under test (2 occurrences on amd64 with -race) - TestQueueConcurrency: deadlock where dequeueRequest blocks forever when queue is drained/deleted by concurrent goroutines (1 occurrence on arm64 with -race, 30m timeout) Signed-off-by: Charlie Le Co-Authored-By: Claude Opus 4.6 (1M context) Signed-off-by: Charlie Le --- .../TestDistributorQuerier_QueryIngestersWithinBoundary.md | 2 +- flaky-tests/TestQueueConcurrency.md | 2 +- pkg/querier/distributor_queryable_test.go | 1 + pkg/scheduler/queue/user_queues_test.go | 1 + 4 files changed, 4 insertions(+), 2 deletions(-) diff --git a/flaky-tests/TestDistributorQuerier_QueryIngestersWithinBoundary.md b/flaky-tests/TestDistributorQuerier_QueryIngestersWithinBoundary.md index 977f16f96b..8a1956e46e 100644 --- a/flaky-tests/TestDistributorQuerier_QueryIngestersWithinBoundary.md +++ b/flaky-tests/TestDistributorQuerier_QueryIngestersWithinBoundary.md @@ -1,6 +1,6 @@ # Flaky Test: TestDistributorQuerier_QueryIngestersWithinBoundary -**Status**: Active +**Status**: Skipped **Occurrences**: 2 **Root Cause**: Timing-sensitive test. `now` is captured at test setup (line 561) via `time.Now()`, but the code under test in `newDistributorQueryable` also calls `time.Now()` internally to compute the lookback boundary. Under the race detector (slower execution), enough wall-clock time elapses between setup and execution that the internal boundary shifts. The "maxT well after lookback boundary" subtest uses only a 10-second margin (`now.Add(-lookback + 10*time.Second)`), which is not enough buffer when the race detector slows execution. The result is that `queryMaxT` falls before the internally-computed boundary, the query is skipped, and `distributor.Calls` is empty instead of having 1 call. diff --git a/flaky-tests/TestQueueConcurrency.md b/flaky-tests/TestQueueConcurrency.md index cb7b585ce2..4d61b96787 100644 --- a/flaky-tests/TestQueueConcurrency.md +++ b/flaky-tests/TestQueueConcurrency.md @@ -1,6 +1,6 @@ # Flaky Test: TestQueueConcurrency -**Status**: Active +**Status**: Skipped **Occurrences**: 1 **Root Cause**: Deadlock/hang in concurrent test. The test spawns 30 goroutines that concurrently enqueue, dequeue, and delete from a queue. Goroutines where `cnt%5 == 0` (and odd) call `dequeueRequest(0, false)` which blocks on a channel waiting for an item. If other goroutines have already drained the queue or called `deleteQueue`, the dequeue goroutine blocks forever — there will never be another enqueue to unblock it. This causes the entire test to hang until the 30-minute timeout. The issue manifests more on arm64 with `-race` due to slower execution and different goroutine scheduling. diff --git a/pkg/querier/distributor_queryable_test.go b/pkg/querier/distributor_queryable_test.go index 810185044d..14565db491 100644 --- a/pkg/querier/distributor_queryable_test.go +++ b/pkg/querier/distributor_queryable_test.go @@ -556,6 +556,7 @@ func TestDistributorQuerier_LabelNames(t *testing.T) { } } func TestDistributorQuerier_QueryIngestersWithinBoundary(t *testing.T) { + t.Skip("Flaky test auto-skipped. See flaky-tests/TestDistributorQuerier_QueryIngestersWithinBoundary.md") t.Parallel() now := time.Now() diff --git a/pkg/scheduler/queue/user_queues_test.go b/pkg/scheduler/queue/user_queues_test.go index 382a1c2eaf..fe98f0e9c0 100644 --- a/pkg/scheduler/queue/user_queues_test.go +++ b/pkg/scheduler/queue/user_queues_test.go @@ -459,6 +459,7 @@ func TestGetOrAddQueueShouldUpdateProperties(t *testing.T) { } func TestQueueConcurrency(t *testing.T) { + t.Skip("Flaky test auto-skipped. See flaky-tests/TestQueueConcurrency.md") const numGoRoutines = 30 limits := MockLimits{ MaxOutstanding: 50, From fa3f4d3b749e6a1b77aaa06be0096f57d79511f0 Mon Sep 17 00:00:00 2001 From: Charlie Le Date: Sun, 12 Apr 2026 14:25:16 -0700 Subject: [PATCH 07/23] flaky-test-audit: skip TestQuerierWithBlocksStorageRunningInSingleBinaryMode New flaky test from ci run 24316060467: integration test failed on arm64 due to Docker container (e2e-cortex-test-consul) disappearing mid-test. Transient CI infrastructure issue, not a code bug. Also updated audit log with run #5 and #6 results. Signed-off-by: Charlie Le Co-Authored-By: Claude Opus 4.6 (1M context) Signed-off-by: Charlie Le --- ...hBlocksStorageRunningInSingleBinaryMode.md | 28 +++++++++++++++++++ flaky-tests/audit-log.md | 2 ++ integration/querier_test.go | 1 + 3 files changed, 31 insertions(+) create mode 100644 flaky-tests/TestQuerierWithBlocksStorageRunningInSingleBinaryMode.md diff --git a/flaky-tests/TestQuerierWithBlocksStorageRunningInSingleBinaryMode.md b/flaky-tests/TestQuerierWithBlocksStorageRunningInSingleBinaryMode.md new file mode 100644 index 0000000000..efe64f666a --- /dev/null +++ b/flaky-tests/TestQuerierWithBlocksStorageRunningInSingleBinaryMode.md @@ -0,0 +1,28 @@ +# Flaky Test: TestQuerierWithBlocksStorageRunningInSingleBinaryMode + +**Status**: Skipped +**Occurrences**: 1 +**Root Cause**: Transient Docker infrastructure failure. The Consul container (`e2e-cortex-test-consul`) disappeared mid-test, causing `Error response from daemon: No such container: e2e-cortex-test-consul`. This is a CI environment issue, not a code bug. The test passed on amd64 in the same run and passed on arm64 in run #6. + +## Occurrences + +### 2026-04-12T21:00:08Z +- **Job**: [ci / integration (ubuntu-24.04-arm, arm64, integration_querier)](https://github.com/cortexproject/cortex/actions/runs/24316060467) +- **Package**: `github.com/cortexproject/cortex/integration` +- **File**: `integration/querier_test.go:217` +- **Subtest**: `[TSDB]_blocks_sharding_enabled,_redis_index_cache,_bucket_index_enabled,thanosEngine=false` +- **Notes**: Passed on amd64 in the same run. Docker container `e2e-cortex-test-consul` vanished during test execution. + +
Build logs + +``` +20:57:39 Error response from daemon: No such container: e2e-cortex-test-consul + Error Trace: /home/runner/work/cortex/cortex/integration/querier_test.go:217 + Error: Received unexpected error: +--- FAIL: TestQuerierWithBlocksStorageRunningInSingleBinaryMode (69.50s) + --- FAIL: TestQuerierWithBlocksStorageRunningInSingleBinaryMode/[TSDB]_blocks_sharding_enabled,_redis_index_cache,_bucket_index_enabled,thanosEngine=false (34.33s) +FAIL +FAIL github.com/cortexproject/cortex/integration 279.595s +``` + +
diff --git a/flaky-tests/audit-log.md b/flaky-tests/audit-log.md index 20008ca1b8..c6c44d30ed 100644 --- a/flaky-tests/audit-log.md +++ b/flaky-tests/audit-log.md @@ -8,3 +8,5 @@ This file tracks every CI run on the `flaky-test-audit` branch. Any test failure | 2026-04-12T19:47:00Z | PASS | No flaky tests detected | [ci run 24314518781](https://github.com/cortexproject/cortex/actions/runs/24314518781) | | 2026-04-12T20:23:09Z | FLAKY | TestQueueConcurrency (pkg/scheduler/queue) — timed out after 30m on arm64 | [ci run 24314927948](https://github.com/cortexproject/cortex/actions/runs/24314927948) | | 2026-04-12T20:32:32Z | FLAKY | TestDistributorQuerier_QueryIngestersWithinBoundary/maxT_well_after_lookback_boundary (pkg/querier) — occurrence #2 | [ci run 24315645679](https://github.com/cortexproject/cortex/actions/runs/24315645679) | +| 2026-04-12T20:49:56Z | FLAKY | TestQuerierWithBlocksStorageRunningInSingleBinaryMode (integration/querier, arm64) — Docker container vanished; TestQueueConcurrency timeout (arm64) — already skipped | [ci run 24316060467](https://github.com/cortexproject/cortex/actions/runs/24316060467) | +| 2026-04-12T21:22:00Z | PASS | No flaky tests detected (with skips applied) | [ci run 24316099145](https://github.com/cortexproject/cortex/actions/runs/24316099145) | diff --git a/integration/querier_test.go b/integration/querier_test.go index 5b3ba40df7..9296e789c2 100644 --- a/integration/querier_test.go +++ b/integration/querier_test.go @@ -31,6 +31,7 @@ import ( ) func TestQuerierWithBlocksStorageRunningInSingleBinaryMode(t *testing.T) { + t.Skip("Flaky test auto-skipped. See flaky-tests/TestQuerierWithBlocksStorageRunningInSingleBinaryMode.md") tests := map[string]struct { bucketStorageType string blocksShardingEnabled bool From 7417808485c9a2a45c7ae58dbf305f35332e8f8b Mon Sep 17 00:00:00 2001 From: Charlie Le Date: Mon, 13 Apr 2026 13:32:34 -0700 Subject: [PATCH 08/23] flaky-test-audit: skip TestQuerierWithStoreGatewayDataBytesLimits New flaky test from ci run 24316771541: integration test failed on amd64 due to widespread Docker container disappearance mid-test. Same transient CI infrastructure pattern as TestQuerierWithBlocksStorageRunningInSingleBinaryMode. Signed-off-by: Charlie Le Co-Authored-By: Claude Opus 4.6 (1M context) Signed-off-by: Charlie Le --- ...tQuerierWithStoreGatewayDataBytesLimits.md | 30 +++++++++++++++++++ flaky-tests/audit-log.md | 1 + integration/querier_test.go | 1 + 3 files changed, 32 insertions(+) create mode 100644 flaky-tests/TestQuerierWithStoreGatewayDataBytesLimits.md diff --git a/flaky-tests/TestQuerierWithStoreGatewayDataBytesLimits.md b/flaky-tests/TestQuerierWithStoreGatewayDataBytesLimits.md new file mode 100644 index 0000000000..638304e487 --- /dev/null +++ b/flaky-tests/TestQuerierWithStoreGatewayDataBytesLimits.md @@ -0,0 +1,30 @@ +# Flaky Test: TestQuerierWithStoreGatewayDataBytesLimits + +**Status**: Skipped +**Occurrences**: 1 +**Root Cause**: Transient Docker infrastructure failure. Multiple containers vanished mid-test (`e2e-cortex-test-consul`, `e2e-cortex-test-distributor`, `e2e-cortex-test-minio-9000`, `e2e-cortex-test-store-gateway`, `e2e-cortex-test-ingester`, `e2e-cortex-test-querier`). Same infrastructure instability pattern as TestQuerierWithBlocksStorageRunningInSingleBinaryMode. The test passed on arm64 in the same run and on both arches in runs #1-#4. + +## Occurrences + +### 2026-04-12T21:37:26Z +- **Job**: [ci / integration (ubuntu-24.04, amd64, integration_querier)](https://github.com/cortexproject/cortex/actions/runs/24316771541) +- **Package**: `github.com/cortexproject/cortex/integration` +- **File**: `integration/querier_test.go:565` +- **Notes**: Widespread Docker container disappearance. All `e2e-cortex-test-*` containers returned "No such container" errors. + +
Build logs + +``` +21:34:11 Error response from daemon: No such container: e2e-cortex-test-distributor +21:34:11 Error response from daemon: No such container: e2e-cortex-test-minio-9000 +21:34:11 Error response from daemon: No such container: e2e-cortex-test-store-gateway +21:34:11 Error response from daemon: No such container: e2e-cortex-test-ingester +21:34:11 Error response from daemon: No such container: e2e-cortex-test-querier +21:34:19 Error response from daemon: No such container: e2e-cortex-test-consul + Error Trace: /home/runner/work/cortex/cortex/integration/querier_test.go:565 + Error: Not equal: +--- FAIL: TestQuerierWithStoreGatewayDataBytesLimits (10.69s) +FAIL github.com/cortexproject/cortex/integration 202.827s +``` + +
diff --git a/flaky-tests/audit-log.md b/flaky-tests/audit-log.md index c6c44d30ed..d3f715df69 100644 --- a/flaky-tests/audit-log.md +++ b/flaky-tests/audit-log.md @@ -10,3 +10,4 @@ This file tracks every CI run on the `flaky-test-audit` branch. Any test failure | 2026-04-12T20:32:32Z | FLAKY | TestDistributorQuerier_QueryIngestersWithinBoundary/maxT_well_after_lookback_boundary (pkg/querier) — occurrence #2 | [ci run 24315645679](https://github.com/cortexproject/cortex/actions/runs/24315645679) | | 2026-04-12T20:49:56Z | FLAKY | TestQuerierWithBlocksStorageRunningInSingleBinaryMode (integration/querier, arm64) — Docker container vanished; TestQueueConcurrency timeout (arm64) — already skipped | [ci run 24316060467](https://github.com/cortexproject/cortex/actions/runs/24316060467) | | 2026-04-12T21:22:00Z | PASS | No flaky tests detected (with skips applied) | [ci run 24316099145](https://github.com/cortexproject/cortex/actions/runs/24316099145) | +| 2026-04-12T21:37:26Z | FLAKY | TestQuerierWithStoreGatewayDataBytesLimits (integration/querier, amd64) — Docker containers vanished | [ci run 24316771541](https://github.com/cortexproject/cortex/actions/runs/24316771541) | diff --git a/integration/querier_test.go b/integration/querier_test.go index 9296e789c2..f4a17a665c 100644 --- a/integration/querier_test.go +++ b/integration/querier_test.go @@ -485,6 +485,7 @@ func TestQuerierWithBlocksStorageLimits(t *testing.T) { } func TestQuerierWithStoreGatewayDataBytesLimits(t *testing.T) { + t.Skip("Flaky test auto-skipped. See flaky-tests/TestQuerierWithStoreGatewayDataBytesLimits.md") const blockRangePeriod = 5 * time.Second s, err := e2e.NewScenario(networkName) From f690b2ef562de17ef65038285aeb5e4eac9e820b Mon Sep 17 00:00:00 2001 From: Charlie Le Date: Mon, 13 Apr 2026 13:35:51 -0700 Subject: [PATCH 09/23] flaky-test-audit: remove skip for TestDistributorQuerier_QueryIngestersWithinBoundary Upstream fix (#7419) injected a clock to eliminate timing drift. Removing our t.Skip since the root cause is properly fixed. Signed-off-by: Charlie Le Co-Authored-By: Claude Opus 4.6 (1M context) Signed-off-by: Charlie Le --- .../TestDistributorQuerier_QueryIngestersWithinBoundary.md | 2 +- pkg/querier/distributor_queryable_test.go | 1 - 2 files changed, 1 insertion(+), 2 deletions(-) diff --git a/flaky-tests/TestDistributorQuerier_QueryIngestersWithinBoundary.md b/flaky-tests/TestDistributorQuerier_QueryIngestersWithinBoundary.md index 8a1956e46e..c8fa72b471 100644 --- a/flaky-tests/TestDistributorQuerier_QueryIngestersWithinBoundary.md +++ b/flaky-tests/TestDistributorQuerier_QueryIngestersWithinBoundary.md @@ -1,6 +1,6 @@ # Flaky Test: TestDistributorQuerier_QueryIngestersWithinBoundary -**Status**: Skipped +**Status**: Fixed **Occurrences**: 2 **Root Cause**: Timing-sensitive test. `now` is captured at test setup (line 561) via `time.Now()`, but the code under test in `newDistributorQueryable` also calls `time.Now()` internally to compute the lookback boundary. Under the race detector (slower execution), enough wall-clock time elapses between setup and execution that the internal boundary shifts. The "maxT well after lookback boundary" subtest uses only a 10-second margin (`now.Add(-lookback + 10*time.Second)`), which is not enough buffer when the race detector slows execution. The result is that `queryMaxT` falls before the internally-computed boundary, the query is skipped, and `distributor.Calls` is empty instead of having 1 call. diff --git a/pkg/querier/distributor_queryable_test.go b/pkg/querier/distributor_queryable_test.go index d1f2ccc244..74a6f84ca9 100644 --- a/pkg/querier/distributor_queryable_test.go +++ b/pkg/querier/distributor_queryable_test.go @@ -556,7 +556,6 @@ func TestDistributorQuerier_LabelNames(t *testing.T) { } } func TestDistributorQuerier_QueryIngestersWithinBoundary(t *testing.T) { - t.Skip("Flaky test auto-skipped. See flaky-tests/TestDistributorQuerier_QueryIngestersWithinBoundary.md") t.Parallel() now := time.Now() From ffcb29c2423d1825daab51ab893d533e0d7aadf2 Mon Sep 17 00:00:00 2001 From: Charlie Le Date: Mon, 13 Apr 2026 13:56:52 -0700 Subject: [PATCH 10/23] flaky-test-audit: skip TestRuler_rules_limit and TestParquetFuzz MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit New flaky tests from ci run 24365433548: - TestRuler_rules_limit: alert state race — test expects "unknown" but ruler evaluates to "inactive" before assertion under -race - TestParquetFuzz: non-deterministic fuzz test, 1 of N random queries failed Also: requires_docker job failed due to Docker install infra issue (not a test failure). Signed-off-by: Charlie Le Co-Authored-By: Claude Opus 4.6 (1M context) Signed-off-by: Charlie Le --- flaky-tests/TestParquetFuzz.md | 26 ++++++++++++++++++++++++++ flaky-tests/TestRuler_rules_limit.md | 28 ++++++++++++++++++++++++++++ flaky-tests/audit-log.md | 1 + integration/parquet_querier_test.go | 1 + pkg/ruler/api_test.go | 1 + 5 files changed, 57 insertions(+) create mode 100644 flaky-tests/TestParquetFuzz.md create mode 100644 flaky-tests/TestRuler_rules_limit.md diff --git a/flaky-tests/TestParquetFuzz.md b/flaky-tests/TestParquetFuzz.md new file mode 100644 index 0000000000..bdc5b499f8 --- /dev/null +++ b/flaky-tests/TestParquetFuzz.md @@ -0,0 +1,26 @@ +# Flaky Test: TestParquetFuzz + +**Status**: Skipped +**Occurrences**: 1 +**Root Cause**: Fuzz test with non-deterministic query generation. The test generates random PromQL queries and compares Cortex results against Prometheus. 1 out of N randomly generated test cases failed, likely due to floating point precision differences or edge cases in query evaluation. This is inherently non-deterministic. + +## Occurrences + +### 2026-04-13T20:44:25Z +- **Job**: [ci / integration (ubuntu-24.04, amd64, integration_query_fuzz)](https://github.com/cortexproject/cortex/actions/runs/24365433548) +- **Package**: `github.com/cortexproject/cortex/integration` +- **File**: `integration/parquet_querier_test.go:176` +- **Notes**: "1 test cases failed" out of the fuzz suite. Non-deterministic by nature. + +
Build logs + +``` + Error Trace: /home/runner/work/cortex/cortex/integration/parquet_querier_test.go:176 + Error: finished query fuzzing tests + Test: TestParquetFuzz + Messages: 1 test cases failed +--- FAIL: TestParquetFuzz (24.12s) +FAIL github.com/cortexproject/cortex/integration 200.599s +``` + +
diff --git a/flaky-tests/TestRuler_rules_limit.md b/flaky-tests/TestRuler_rules_limit.md new file mode 100644 index 0000000000..1bd6bf9bae --- /dev/null +++ b/flaky-tests/TestRuler_rules_limit.md @@ -0,0 +1,28 @@ +# Flaky Test: TestRuler_rules_limit + +**Status**: Skipped +**Occurrences**: 1 +**Root Cause**: Race condition in alert state. The test expects `"state":"unknown"` for an alerting rule but gets `"state":"inactive"`. The ruler evaluates rules asynchronously after startup, and under the race detector the alert state transitions from "unknown" to "inactive" before the HTTP response is captured. This is a timing-dependent assertion. + +## Occurrences + +### 2026-04-13T20:43:11Z +- **Job**: [ci / test (amd64)](https://github.com/cortexproject/cortex/actions/runs/24365433548) +- **Package**: `github.com/cortexproject/cortex/pkg/ruler` +- **File**: `pkg/ruler/api_test.go:422` +- **Notes**: Failed on amd64 with `-race`. The diff is `"state":"unknown"` (expected) vs `"state":"inactive"` (actual). + +
Build logs + +``` +--- FAIL: TestRuler_rules_limit (0.06s) + api_test.go:422: + Error Trace: /__w/cortex/cortex/pkg/ruler/api_test.go:422 + Error: Not equal: + expected: ..."state":"unknown"... + actual : ..."state":"inactive"... + Test: TestRuler_rules_limit +FAIL github.com/cortexproject/cortex/pkg/ruler 43.279s +``` + +
diff --git a/flaky-tests/audit-log.md b/flaky-tests/audit-log.md index d3f715df69..1e1230151f 100644 --- a/flaky-tests/audit-log.md +++ b/flaky-tests/audit-log.md @@ -11,3 +11,4 @@ This file tracks every CI run on the `flaky-test-audit` branch. Any test failure | 2026-04-12T20:49:56Z | FLAKY | TestQuerierWithBlocksStorageRunningInSingleBinaryMode (integration/querier, arm64) — Docker container vanished; TestQueueConcurrency timeout (arm64) — already skipped | [ci run 24316060467](https://github.com/cortexproject/cortex/actions/runs/24316060467) | | 2026-04-12T21:22:00Z | PASS | No flaky tests detected (with skips applied) | [ci run 24316099145](https://github.com/cortexproject/cortex/actions/runs/24316099145) | | 2026-04-12T21:37:26Z | FLAKY | TestQuerierWithStoreGatewayDataBytesLimits (integration/querier, amd64) — Docker containers vanished | [ci run 24316771541](https://github.com/cortexproject/cortex/actions/runs/24316771541) | +| 2026-04-13T20:43:11Z | FLAKY | TestRuler_rules_limit (pkg/ruler) — alert state race; TestParquetFuzz (integration/query_fuzz) — fuzz non-determinism; requires_docker — Docker install failure | [ci run 24365433548](https://github.com/cortexproject/cortex/actions/runs/24365433548) | diff --git a/integration/parquet_querier_test.go b/integration/parquet_querier_test.go index 2c3d8b9256..3022d0feeb 100644 --- a/integration/parquet_querier_test.go +++ b/integration/parquet_querier_test.go @@ -33,6 +33,7 @@ import ( ) func TestParquetFuzz(t *testing.T) { + t.Skip("Flaky test auto-skipped. See flaky-tests/TestParquetFuzz.md") s, err := e2e.NewScenario(networkName) require.NoError(t, err) diff --git a/pkg/ruler/api_test.go b/pkg/ruler/api_test.go index d37084a3e4..881866281c 100644 --- a/pkg/ruler/api_test.go +++ b/pkg/ruler/api_test.go @@ -365,6 +365,7 @@ func TestRuler_rules_special_characters(t *testing.T) { } func TestRuler_rules_limit(t *testing.T) { + t.Skip("Flaky test auto-skipped. See flaky-tests/TestRuler_rules_limit.md") store := newMockRuleStore(mockRulesLimit, nil) cfg := defaultRulerConfig(t) From dc77f0e9a91b168e3482d95bab90530517fffc30 Mon Sep 17 00:00:00 2001 From: Charlie Le Date: Mon, 13 Apr 2026 14:20:52 -0700 Subject: [PATCH 11/23] flaky-test-audit: skip TestBlocksCleaner_ShouldRemoveBlocksOutsideRetentionPeriod and TestRuler_rules New flaky tests from ci run 24366476213: - TestBlocksCleaner_ShouldRemoveBlocksOutsideRetentionPeriod: assertion failures on arm64 no-race in pkg/compactor - TestRuler_rules: same alert state race as TestRuler_rules_limit, "state":"unknown" vs "state":"inactive" in configs-db job Signed-off-by: Charlie Le Co-Authored-By: Claude Opus 4.6 (1M context) Signed-off-by: Charlie Le --- ...houldRemoveBlocksOutsideRetentionPeriod.md | 30 +++++++++++++++++++ flaky-tests/TestRuler_rules.md | 25 ++++++++++++++++ flaky-tests/audit-log.md | 1 + pkg/compactor/blocks_cleaner_test.go | 1 + pkg/ruler/api_test.go | 1 + 5 files changed, 58 insertions(+) create mode 100644 flaky-tests/TestBlocksCleaner_ShouldRemoveBlocksOutsideRetentionPeriod.md create mode 100644 flaky-tests/TestRuler_rules.md diff --git a/flaky-tests/TestBlocksCleaner_ShouldRemoveBlocksOutsideRetentionPeriod.md b/flaky-tests/TestBlocksCleaner_ShouldRemoveBlocksOutsideRetentionPeriod.md new file mode 100644 index 0000000000..bb8db0a6a0 --- /dev/null +++ b/flaky-tests/TestBlocksCleaner_ShouldRemoveBlocksOutsideRetentionPeriod.md @@ -0,0 +1,30 @@ +# Flaky Test: TestBlocksCleaner_ShouldRemoveBlocksOutsideRetentionPeriod + +**Status**: Skipped +**Occurrences**: 1 +**Root Cause**: Multiple subtests failed on arm64 without `-race`. Errors at `blocks_cleaner_test.go:699` ("Not equal") and `blocks_cleaner_test.go:829`/`:866` ("Received unexpected error"). Likely timing-dependent block cleanup assertions that assume synchronous completion. + +## Occurrences + +### 2026-04-13T21:00:49Z +- **Job**: [ci / test-no-race (arm64)](https://github.com/cortexproject/cortex/actions/runs/24366476213) +- **Package**: `github.com/cortexproject/cortex/pkg/compactor` +- **File**: `pkg/compactor/blocks_cleaner_test.go:699, :829, :866` +- **Notes**: Failed on arm64 without `-race`. Passed on amd64 (both race and no-race) and arm64 with `-race`. + +
Build logs + +``` +--- FAIL: TestBlocksCleaner_ShouldRemoveBlocksOutsideRetentionPeriod (0.47s) + Error Trace: /__w/cortex/cortex/pkg/compactor/blocks_cleaner_test.go:699 + Error: Not equal: + Error Trace: /__w/cortex/cortex/pkg/compactor/blocks_cleaner_test.go:829 + Error: Received unexpected error: + Error Trace: /__w/cortex/cortex/pkg/compactor/blocks_cleaner_test.go:699 + Error: Not equal: + Error Trace: /__w/cortex/cortex/pkg/compactor/blocks_cleaner_test.go:866 + Error: Received unexpected error: +FAIL github.com/cortexproject/cortex/pkg/compactor 90.399s +``` + +
diff --git a/flaky-tests/TestRuler_rules.md b/flaky-tests/TestRuler_rules.md new file mode 100644 index 0000000000..d218ef1817 --- /dev/null +++ b/flaky-tests/TestRuler_rules.md @@ -0,0 +1,25 @@ +# Flaky Test: TestRuler_rules + +**Status**: Skipped +**Occurrences**: 1 +**Root Cause**: Same alert state race as TestRuler_rules_limit. The test expects `"state":"unknown"` for an alerting rule but gets `"state":"inactive"`. The ruler evaluates rules asynchronously after startup, and the alert state transitions before the HTTP response is captured. This manifested in the `integration-configs-db` job which runs the ruler tests with a database backend. + +## Occurrences + +### 2026-04-13T21:05:42Z +- **Job**: [ci / integration-configs-db (ubuntu-24.04, amd64)](https://github.com/cortexproject/cortex/actions/runs/24366476213) +- **Package**: `github.com/cortexproject/cortex/pkg/ruler` +- **File**: `pkg/ruler/api_test.go:306` +- **Notes**: Same `"state":"unknown"` vs `"state":"inactive"` pattern as TestRuler_rules_limit. + +
Build logs + +``` + Error Trace: /go/src/github.com/cortexproject/cortex/pkg/ruler/api_test.go:306 + Error: Not equal: + expected: ..."state":"unknown"... + actual : ..."state":"inactive"... +--- FAIL: TestRuler_rules (0.05s) +``` + +
diff --git a/flaky-tests/audit-log.md b/flaky-tests/audit-log.md index 1e1230151f..677cd81332 100644 --- a/flaky-tests/audit-log.md +++ b/flaky-tests/audit-log.md @@ -12,3 +12,4 @@ This file tracks every CI run on the `flaky-test-audit` branch. Any test failure | 2026-04-12T21:22:00Z | PASS | No flaky tests detected (with skips applied) | [ci run 24316099145](https://github.com/cortexproject/cortex/actions/runs/24316099145) | | 2026-04-12T21:37:26Z | FLAKY | TestQuerierWithStoreGatewayDataBytesLimits (integration/querier, amd64) — Docker containers vanished | [ci run 24316771541](https://github.com/cortexproject/cortex/actions/runs/24316771541) | | 2026-04-13T20:43:11Z | FLAKY | TestRuler_rules_limit (pkg/ruler) — alert state race; TestParquetFuzz (integration/query_fuzz) — fuzz non-determinism; requires_docker — Docker install failure | [ci run 24365433548](https://github.com/cortexproject/cortex/actions/runs/24365433548) | +| 2026-04-13T21:00:49Z | FLAKY | TestBlocksCleaner_ShouldRemoveBlocksOutsideRetentionPeriod (pkg/compactor, arm64); TestRuler_rules (pkg/ruler, configs-db) — alert state race | [ci run 24366476213](https://github.com/cortexproject/cortex/actions/runs/24366476213) | diff --git a/pkg/compactor/blocks_cleaner_test.go b/pkg/compactor/blocks_cleaner_test.go index ea24739257..7a017a1847 100644 --- a/pkg/compactor/blocks_cleaner_test.go +++ b/pkg/compactor/blocks_cleaner_test.go @@ -658,6 +658,7 @@ func TestBlocksCleaner_ListBlocksOutsideRetentionPeriod(t *testing.T) { } func TestBlocksCleaner_ShouldRemoveBlocksOutsideRetentionPeriod(t *testing.T) { + t.Skip("Flaky test auto-skipped. See flaky-tests/TestBlocksCleaner_ShouldRemoveBlocksOutsideRetentionPeriod.md") bucketClient, _ := cortex_testutil.PrepareFilesystemBucket(t) bucketClient = bucketindex.BucketWithGlobalMarkers(bucketClient) diff --git a/pkg/ruler/api_test.go b/pkg/ruler/api_test.go index 881866281c..3b0a1db22b 100644 --- a/pkg/ruler/api_test.go +++ b/pkg/ruler/api_test.go @@ -248,6 +248,7 @@ func stripEvaluationFields(t *testing.T, r util_api.Response) { } func TestRuler_rules(t *testing.T) { + t.Skip("Flaky test auto-skipped. See flaky-tests/TestRuler_rules.md") store := newMockRuleStore(mockRules, nil) cfg := defaultRulerConfig(t) From a9b732734e832468d7593009a221d6376e056673 Mon Sep 17 00:00:00 2001 From: Charlie Le Date: Thu, 16 Apr 2026 10:44:25 -0700 Subject: [PATCH 12/23] flaky-test-audit: run #11 all tests passed, Docker Hub rate limit on one job CI run 24524134384: all test jobs passed. Only failure was integration_overrides Docker install hitting Docker Hub rate limit (toomanyrequests). Not a test issue. Signed-off-by: Charlie Le Co-Authored-By: Claude Opus 4.6 (1M context) Signed-off-by: Charlie Le --- flaky-tests/audit-log.md | 1 + 1 file changed, 1 insertion(+) diff --git a/flaky-tests/audit-log.md b/flaky-tests/audit-log.md index 677cd81332..638cb25926 100644 --- a/flaky-tests/audit-log.md +++ b/flaky-tests/audit-log.md @@ -13,3 +13,4 @@ This file tracks every CI run on the `flaky-test-audit` branch. Any test failure | 2026-04-12T21:37:26Z | FLAKY | TestQuerierWithStoreGatewayDataBytesLimits (integration/querier, amd64) — Docker containers vanished | [ci run 24316771541](https://github.com/cortexproject/cortex/actions/runs/24316771541) | | 2026-04-13T20:43:11Z | FLAKY | TestRuler_rules_limit (pkg/ruler) — alert state race; TestParquetFuzz (integration/query_fuzz) — fuzz non-determinism; requires_docker — Docker install failure | [ci run 24365433548](https://github.com/cortexproject/cortex/actions/runs/24365433548) | | 2026-04-13T21:00:49Z | FLAKY | TestBlocksCleaner_ShouldRemoveBlocksOutsideRetentionPeriod (pkg/compactor, arm64); TestRuler_rules (pkg/ruler, configs-db) — alert state race | [ci run 24366476213](https://github.com/cortexproject/cortex/actions/runs/24366476213) | +| 2026-04-16T17:27:44Z | INFRA | integration_overrides Docker install failed — Docker Hub rate limit (toomanyrequests). All tests passed. | [ci run 24524134384](https://github.com/cortexproject/cortex/actions/runs/24524134384) | From 52ec829f4cf7748ad2ec342cf6539e4c8904b4d2 Mon Sep 17 00:00:00 2001 From: Charlie Le Date: Thu, 16 Apr 2026 11:06:35 -0700 Subject: [PATCH 13/23] flaky-test-audit: run #12 all tests passed, Docker Hub rate limit again CI run 24525138727: all test jobs passed. Only failure was requires_docker Docker install hitting Docker Hub rate limit. Signed-off-by: Charlie Le Co-Authored-By: Claude Opus 4.6 (1M context) Signed-off-by: Charlie Le --- flaky-tests/audit-log.md | 1 + 1 file changed, 1 insertion(+) diff --git a/flaky-tests/audit-log.md b/flaky-tests/audit-log.md index 638cb25926..831d35f395 100644 --- a/flaky-tests/audit-log.md +++ b/flaky-tests/audit-log.md @@ -14,3 +14,4 @@ This file tracks every CI run on the `flaky-test-audit` branch. Any test failure | 2026-04-13T20:43:11Z | FLAKY | TestRuler_rules_limit (pkg/ruler) — alert state race; TestParquetFuzz (integration/query_fuzz) — fuzz non-determinism; requires_docker — Docker install failure | [ci run 24365433548](https://github.com/cortexproject/cortex/actions/runs/24365433548) | | 2026-04-13T21:00:49Z | FLAKY | TestBlocksCleaner_ShouldRemoveBlocksOutsideRetentionPeriod (pkg/compactor, arm64); TestRuler_rules (pkg/ruler, configs-db) — alert state race | [ci run 24366476213](https://github.com/cortexproject/cortex/actions/runs/24366476213) | | 2026-04-16T17:27:44Z | INFRA | integration_overrides Docker install failed — Docker Hub rate limit (toomanyrequests). All tests passed. | [ci run 24524134384](https://github.com/cortexproject/cortex/actions/runs/24524134384) | +| 2026-04-16T17:50:32Z | INFRA | requires_docker Docker install failed — Docker Hub rate limit (toomanyrequests). All tests passed. | [ci run 24525138727](https://github.com/cortexproject/cortex/actions/runs/24525138727) | From fad1f66cf7adecc975c6f4e01214b82e686a0797 Mon Sep 17 00:00:00 2001 From: Charlie Le Date: Thu, 16 Apr 2026 11:30:24 -0700 Subject: [PATCH 14/23] flaky-test-audit: skip TestQuerierWithBlocksStorageLimits MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit New flaky test from ci run 24526116820: integration test on arm64 expected HTTP 422 but got 500 — likely a race in limit-checking initialization. Signed-off-by: Charlie Le Co-Authored-By: Claude Opus 4.6 (1M context) Signed-off-by: Charlie Le --- .../TestQuerierWithBlocksStorageLimits.md | 27 +++++++++++++++++++ flaky-tests/audit-log.md | 1 + integration/querier_test.go | 1 + 3 files changed, 29 insertions(+) create mode 100644 flaky-tests/TestQuerierWithBlocksStorageLimits.md diff --git a/flaky-tests/TestQuerierWithBlocksStorageLimits.md b/flaky-tests/TestQuerierWithBlocksStorageLimits.md new file mode 100644 index 0000000000..ff56e3f02d --- /dev/null +++ b/flaky-tests/TestQuerierWithBlocksStorageLimits.md @@ -0,0 +1,27 @@ +# Flaky Test: TestQuerierWithBlocksStorageLimits + +**Status**: Skipped +**Occurrences**: 1 +**Root Cause**: The test expects HTTP 422 (Unprocessable Entity) for a query exceeding limits but received HTTP 500 (Internal Server Error) instead. This is likely a race where the store-gateway or querier hasn't fully initialized its limit-checking path, causing the error to surface as a generic 500 rather than the expected 422. Failed on arm64. + +## Occurrences + +### 2026-04-16T18:18:08Z +- **Job**: [ci / integration (ubuntu-24.04-arm, arm64, integration_querier)](https://github.com/cortexproject/cortex/actions/runs/24526116820) +- **Package**: `github.com/cortexproject/cortex/integration` +- **File**: `integration/querier_test.go:468` +- **Notes**: Expected 422, got 500. Passed on amd64 in the same run. + +
Build logs + +``` + Error Trace: /home/runner/work/cortex/cortex/integration/querier_test.go:468 + Error: Not equal: + expected: 422 + actual : 500 + Test: TestQuerierWithBlocksStorageLimits +--- FAIL: TestQuerierWithBlocksStorageLimits (11.00s) +FAIL github.com/cortexproject/cortex/integration 193.371s +``` + +
diff --git a/flaky-tests/audit-log.md b/flaky-tests/audit-log.md index 831d35f395..6f1db34d11 100644 --- a/flaky-tests/audit-log.md +++ b/flaky-tests/audit-log.md @@ -15,3 +15,4 @@ This file tracks every CI run on the `flaky-test-audit` branch. Any test failure | 2026-04-13T21:00:49Z | FLAKY | TestBlocksCleaner_ShouldRemoveBlocksOutsideRetentionPeriod (pkg/compactor, arm64); TestRuler_rules (pkg/ruler, configs-db) — alert state race | [ci run 24366476213](https://github.com/cortexproject/cortex/actions/runs/24366476213) | | 2026-04-16T17:27:44Z | INFRA | integration_overrides Docker install failed — Docker Hub rate limit (toomanyrequests). All tests passed. | [ci run 24524134384](https://github.com/cortexproject/cortex/actions/runs/24524134384) | | 2026-04-16T17:50:32Z | INFRA | requires_docker Docker install failed — Docker Hub rate limit (toomanyrequests). All tests passed. | [ci run 24525138727](https://github.com/cortexproject/cortex/actions/runs/24525138727) | +| 2026-04-16T18:18:08Z | FLAKY | TestQuerierWithBlocksStorageLimits (integration/querier, arm64) — expected 422 got 500 | [ci run 24526116820](https://github.com/cortexproject/cortex/actions/runs/24526116820) | diff --git a/integration/querier_test.go b/integration/querier_test.go index f4a17a665c..b8ae4f1c99 100644 --- a/integration/querier_test.go +++ b/integration/querier_test.go @@ -382,6 +382,7 @@ func TestQuerierWithBlocksStorageOnMissingBlocksFromStorage(t *testing.T) { } func TestQuerierWithBlocksStorageLimits(t *testing.T) { + t.Skip("Flaky test auto-skipped. See flaky-tests/TestQuerierWithBlocksStorageLimits.md") const blockRangePeriod = 5 * time.Second s, err := e2e.NewScenario(networkName) From 90d59d3746925cc52dc87ddfd6afe0b8bf92d105 Mon Sep 17 00:00:00 2001 From: Charlie Le Date: Thu, 16 Apr 2026 11:52:45 -0700 Subject: [PATCH 15/23] flaky-test-audit: run #14 fully green CI run 24527178425: all jobs passed including all integration tests. No flaky tests detected, no infra failures. Signed-off-by: Charlie Le Co-Authored-By: Claude Opus 4.6 (1M context) Signed-off-by: Charlie Le --- flaky-tests/audit-log.md | 1 + 1 file changed, 1 insertion(+) diff --git a/flaky-tests/audit-log.md b/flaky-tests/audit-log.md index 6f1db34d11..bb53b057c6 100644 --- a/flaky-tests/audit-log.md +++ b/flaky-tests/audit-log.md @@ -16,3 +16,4 @@ This file tracks every CI run on the `flaky-test-audit` branch. Any test failure | 2026-04-16T17:27:44Z | INFRA | integration_overrides Docker install failed — Docker Hub rate limit (toomanyrequests). All tests passed. | [ci run 24524134384](https://github.com/cortexproject/cortex/actions/runs/24524134384) | | 2026-04-16T17:50:32Z | INFRA | requires_docker Docker install failed — Docker Hub rate limit (toomanyrequests). All tests passed. | [ci run 24525138727](https://github.com/cortexproject/cortex/actions/runs/24525138727) | | 2026-04-16T18:18:08Z | FLAKY | TestQuerierWithBlocksStorageLimits (integration/querier, arm64) — expected 422 got 500 | [ci run 24526116820](https://github.com/cortexproject/cortex/actions/runs/24526116820) | +| 2026-04-16T18:52:00Z | PASS | All jobs passed, no flaky tests detected | [ci run 24527178425](https://github.com/cortexproject/cortex/actions/runs/24527178425) | From 95f2ff6986fede2f9d7af5e25c8c687c8dffa04f Mon Sep 17 00:00:00 2001 From: Charlie Le Date: Thu, 16 Apr 2026 12:15:58 -0700 Subject: [PATCH 16/23] flaky-test-audit: run #15 fully green (2nd consecutive) CI run 24528187016: all jobs passed. Two consecutive fully clean runs with all 8 skipped flaky tests and no new failures. Signed-off-by: Charlie Le Co-Authored-By: Claude Opus 4.6 (1M context) Signed-off-by: Charlie Le --- flaky-tests/audit-log.md | 1 + 1 file changed, 1 insertion(+) diff --git a/flaky-tests/audit-log.md b/flaky-tests/audit-log.md index bb53b057c6..99a62b00ee 100644 --- a/flaky-tests/audit-log.md +++ b/flaky-tests/audit-log.md @@ -17,3 +17,4 @@ This file tracks every CI run on the `flaky-test-audit` branch. Any test failure | 2026-04-16T17:50:32Z | INFRA | requires_docker Docker install failed — Docker Hub rate limit (toomanyrequests). All tests passed. | [ci run 24525138727](https://github.com/cortexproject/cortex/actions/runs/24525138727) | | 2026-04-16T18:18:08Z | FLAKY | TestQuerierWithBlocksStorageLimits (integration/querier, arm64) — expected 422 got 500 | [ci run 24526116820](https://github.com/cortexproject/cortex/actions/runs/24526116820) | | 2026-04-16T18:52:00Z | PASS | All jobs passed, no flaky tests detected | [ci run 24527178425](https://github.com/cortexproject/cortex/actions/runs/24527178425) | +| 2026-04-16T19:15:00Z | PASS | All jobs passed, no flaky tests detected (2nd consecutive clean run) | [ci run 24528187016](https://github.com/cortexproject/cortex/actions/runs/24528187016) | From 10c2c6bec337e75d9cdc837c24c71e48f13aae15 Mon Sep 17 00:00:00 2001 From: Charlie Le Date: Thu, 16 Apr 2026 12:38:37 -0700 Subject: [PATCH 17/23] flaky-test-audit: run #16 fully green (3rd consecutive) CI run 24529191249: all jobs passed. Three consecutive fully clean runs. The flaky test audit has stabilized with 9 flaky tests identified (1 fixed upstream, 8 skipped). Signed-off-by: Charlie Le Co-Authored-By: Claude Opus 4.6 (1M context) Signed-off-by: Charlie Le --- flaky-tests/audit-log.md | 1 + 1 file changed, 1 insertion(+) diff --git a/flaky-tests/audit-log.md b/flaky-tests/audit-log.md index 99a62b00ee..cd43cf2d73 100644 --- a/flaky-tests/audit-log.md +++ b/flaky-tests/audit-log.md @@ -18,3 +18,4 @@ This file tracks every CI run on the `flaky-test-audit` branch. Any test failure | 2026-04-16T18:18:08Z | FLAKY | TestQuerierWithBlocksStorageLimits (integration/querier, arm64) — expected 422 got 500 | [ci run 24526116820](https://github.com/cortexproject/cortex/actions/runs/24526116820) | | 2026-04-16T18:52:00Z | PASS | All jobs passed, no flaky tests detected | [ci run 24527178425](https://github.com/cortexproject/cortex/actions/runs/24527178425) | | 2026-04-16T19:15:00Z | PASS | All jobs passed, no flaky tests detected (2nd consecutive clean run) | [ci run 24528187016](https://github.com/cortexproject/cortex/actions/runs/24528187016) | +| 2026-04-16T19:38:00Z | PASS | All jobs passed (3rd consecutive clean run) | [ci run 24529191249](https://github.com/cortexproject/cortex/actions/runs/24529191249) | From 1bddfe629a806058bea2c73003f758659d1c281c Mon Sep 17 00:00:00 2001 From: Charlie Le Date: Thu, 16 Apr 2026 13:36:31 -0700 Subject: [PATCH 18/23] flaky-test-audit: run #17 fully green (4th consecutive) Signed-off-by: Charlie Le Co-Authored-By: Claude Opus 4.6 (1M context) Signed-off-by: Charlie Le --- flaky-tests/audit-log.md | 1 + 1 file changed, 1 insertion(+) diff --git a/flaky-tests/audit-log.md b/flaky-tests/audit-log.md index cd43cf2d73..cf9deb52b0 100644 --- a/flaky-tests/audit-log.md +++ b/flaky-tests/audit-log.md @@ -19,3 +19,4 @@ This file tracks every CI run on the `flaky-test-audit` branch. Any test failure | 2026-04-16T18:52:00Z | PASS | All jobs passed, no flaky tests detected | [ci run 24527178425](https://github.com/cortexproject/cortex/actions/runs/24527178425) | | 2026-04-16T19:15:00Z | PASS | All jobs passed, no flaky tests detected (2nd consecutive clean run) | [ci run 24528187016](https://github.com/cortexproject/cortex/actions/runs/24528187016) | | 2026-04-16T19:38:00Z | PASS | All jobs passed (3rd consecutive clean run) | [ci run 24529191249](https://github.com/cortexproject/cortex/actions/runs/24529191249) | +| 2026-04-16T20:00:00Z | PASS | All jobs passed (4th consecutive clean run) | [ci run 24530173026](https://github.com/cortexproject/cortex/actions/runs/24530173026) | From 2acd77ad07036bc6e204de40ab74ae886f6c14f8 Mon Sep 17 00:00:00 2001 From: Charlie Le Date: Thu, 16 Apr 2026 13:58:41 -0700 Subject: [PATCH 19/23] flaky-test-audit: run #18 fully green (5th consecutive) Signed-off-by: Charlie Le Co-Authored-By: Claude Opus 4.6 (1M context) Signed-off-by: Charlie Le --- flaky-tests/audit-log.md | 1 + 1 file changed, 1 insertion(+) diff --git a/flaky-tests/audit-log.md b/flaky-tests/audit-log.md index cf9deb52b0..b246d69d54 100644 --- a/flaky-tests/audit-log.md +++ b/flaky-tests/audit-log.md @@ -20,3 +20,4 @@ This file tracks every CI run on the `flaky-test-audit` branch. Any test failure | 2026-04-16T19:15:00Z | PASS | All jobs passed, no flaky tests detected (2nd consecutive clean run) | [ci run 24528187016](https://github.com/cortexproject/cortex/actions/runs/24528187016) | | 2026-04-16T19:38:00Z | PASS | All jobs passed (3rd consecutive clean run) | [ci run 24529191249](https://github.com/cortexproject/cortex/actions/runs/24529191249) | | 2026-04-16T20:00:00Z | PASS | All jobs passed (4th consecutive clean run) | [ci run 24530173026](https://github.com/cortexproject/cortex/actions/runs/24530173026) | +| 2026-04-16T20:58:00Z | PASS | All jobs passed (5th consecutive clean run) | [ci run 24532713349](https://github.com/cortexproject/cortex/actions/runs/24532713349) | From 9055d3511342b8c5267d4da5e7b9a8a808c0beee Mon Sep 17 00:00:00 2001 From: Charlie Le Date: Thu, 16 Apr 2026 14:22:37 -0700 Subject: [PATCH 20/23] flaky-test-audit: run #19 fully green (6th consecutive) Signed-off-by: Charlie Le Co-Authored-By: Claude Opus 4.6 (1M context) Signed-off-by: Charlie Le --- flaky-tests/audit-log.md | 1 + 1 file changed, 1 insertion(+) diff --git a/flaky-tests/audit-log.md b/flaky-tests/audit-log.md index b246d69d54..fe464f8031 100644 --- a/flaky-tests/audit-log.md +++ b/flaky-tests/audit-log.md @@ -21,3 +21,4 @@ This file tracks every CI run on the `flaky-test-audit` branch. Any test failure | 2026-04-16T19:38:00Z | PASS | All jobs passed (3rd consecutive clean run) | [ci run 24529191249](https://github.com/cortexproject/cortex/actions/runs/24529191249) | | 2026-04-16T20:00:00Z | PASS | All jobs passed (4th consecutive clean run) | [ci run 24530173026](https://github.com/cortexproject/cortex/actions/runs/24530173026) | | 2026-04-16T20:58:00Z | PASS | All jobs passed (5th consecutive clean run) | [ci run 24532713349](https://github.com/cortexproject/cortex/actions/runs/24532713349) | +| 2026-04-16T21:22:00Z | PASS | All jobs passed (6th consecutive clean run) | [ci run 24533695679](https://github.com/cortexproject/cortex/actions/runs/24533695679) | From 6e72f317c86d450a08c08ef59610b4ca9942301e Mon Sep 17 00:00:00 2001 From: Charlie Le Date: Thu, 16 Apr 2026 14:46:17 -0700 Subject: [PATCH 21/23] flaky-test-audit: skip TestMinimizeSpreadTokenGenerator New flaky test from ci run 24534699120: token spread distance error 0.01097 barely exceeded the 0.01 threshold. Non-deterministic due to randomized token generation. Signed-off-by: Charlie Le Co-Authored-By: Claude Opus 4.6 (1M context) Signed-off-by: Charlie Le --- .../TestMinimizeSpreadTokenGenerator.md | 29 +++++++++++++++++++ flaky-tests/audit-log.md | 1 + pkg/ring/token_generator_test.go | 1 + 3 files changed, 31 insertions(+) create mode 100644 flaky-tests/TestMinimizeSpreadTokenGenerator.md diff --git a/flaky-tests/TestMinimizeSpreadTokenGenerator.md b/flaky-tests/TestMinimizeSpreadTokenGenerator.md new file mode 100644 index 0000000000..db14133240 --- /dev/null +++ b/flaky-tests/TestMinimizeSpreadTokenGenerator.md @@ -0,0 +1,29 @@ +# Flaky Test: TestMinimizeSpreadTokenGenerator + +**Status**: Skipped +**Occurrences**: 1 +**Root Cause**: Floating point precision boundary. The test asserts that token spread distance error is <= 0.01, but the result was 0.01097 — barely over the threshold. The token generation algorithm involves randomness and the assertion threshold is too tight for non-deterministic outcomes. Failed on amd64 without `-race`. + +## Occurrences + +### 2026-04-16T21:27:42Z +- **Job**: [ci / test-no-race (amd64)](https://github.com/cortexproject/cortex/actions/runs/24534699120) +- **Package**: `github.com/cortexproject/cortex/pkg/ring` +- **File**: `pkg/ring/token_generator_test.go:202` (called from `:106`) +- **Notes**: Error was 0.01097, threshold is 0.01. Passed on arm64 and amd64 with `-race`. + +
Build logs + +``` +--- FAIL: TestMinimizeSpreadTokenGenerator (2.03s) + token_generator_test.go:202: + Error Trace: /__w/cortex/cortex/pkg/ring/token_generator_test.go:202 + /__w/cortex/cortex/pkg/ring/token_generator_test.go:106 + Error: Condition failed! + Test: TestMinimizeSpreadTokenGenerator + Messages: [minimize-42-zone1] expected and real distance error is greater than 0.01 -> 0.010970902851567321[8.2595524e+07/8.3511723e+07] +FAIL +FAIL github.com/cortexproject/cortex/pkg/ring 67.285s +``` + +
diff --git a/flaky-tests/audit-log.md b/flaky-tests/audit-log.md index fe464f8031..3535d3bbe9 100644 --- a/flaky-tests/audit-log.md +++ b/flaky-tests/audit-log.md @@ -22,3 +22,4 @@ This file tracks every CI run on the `flaky-test-audit` branch. Any test failure | 2026-04-16T20:00:00Z | PASS | All jobs passed (4th consecutive clean run) | [ci run 24530173026](https://github.com/cortexproject/cortex/actions/runs/24530173026) | | 2026-04-16T20:58:00Z | PASS | All jobs passed (5th consecutive clean run) | [ci run 24532713349](https://github.com/cortexproject/cortex/actions/runs/24532713349) | | 2026-04-16T21:22:00Z | PASS | All jobs passed (6th consecutive clean run) | [ci run 24533695679](https://github.com/cortexproject/cortex/actions/runs/24533695679) | +| 2026-04-16T21:27:42Z | FLAKY | TestMinimizeSpreadTokenGenerator (pkg/ring) — floating point precision boundary (0.01097 > 0.01 threshold) | [ci run 24534699120](https://github.com/cortexproject/cortex/actions/runs/24534699120) | diff --git a/pkg/ring/token_generator_test.go b/pkg/ring/token_generator_test.go index a76826eb42..2ba5b2817b 100644 --- a/pkg/ring/token_generator_test.go +++ b/pkg/ring/token_generator_test.go @@ -76,6 +76,7 @@ func TestGenerateTokens_IgnoresOldTokens(t *testing.T) { } func TestMinimizeSpreadTokenGenerator(t *testing.T) { + t.Skip("Flaky test auto-skipped. See flaky-tests/TestMinimizeSpreadTokenGenerator.md") rindDesc := NewDesc() zones := []string{"zone1", "zone2", "zone3"} From 3a60442c7425d3b64081d42301c8c304bbd48fbc Mon Sep 17 00:00:00 2001 From: Charlie Le Date: Thu, 16 Apr 2026 15:08:40 -0700 Subject: [PATCH 22/23] flaky-test-audit: run #21 fully green Signed-off-by: Charlie Le Co-Authored-By: Claude Opus 4.6 (1M context) Signed-off-by: Charlie Le --- flaky-tests/audit-log.md | 1 + 1 file changed, 1 insertion(+) diff --git a/flaky-tests/audit-log.md b/flaky-tests/audit-log.md index 3535d3bbe9..81821c0bf9 100644 --- a/flaky-tests/audit-log.md +++ b/flaky-tests/audit-log.md @@ -23,3 +23,4 @@ This file tracks every CI run on the `flaky-test-audit` branch. Any test failure | 2026-04-16T20:58:00Z | PASS | All jobs passed (5th consecutive clean run) | [ci run 24532713349](https://github.com/cortexproject/cortex/actions/runs/24532713349) | | 2026-04-16T21:22:00Z | PASS | All jobs passed (6th consecutive clean run) | [ci run 24533695679](https://github.com/cortexproject/cortex/actions/runs/24533695679) | | 2026-04-16T21:27:42Z | FLAKY | TestMinimizeSpreadTokenGenerator (pkg/ring) — floating point precision boundary (0.01097 > 0.01 threshold) | [ci run 24534699120](https://github.com/cortexproject/cortex/actions/runs/24534699120) | +| 2026-04-16T22:08:00Z | PASS | All jobs passed | [ci run 24535673138](https://github.com/cortexproject/cortex/actions/runs/24535673138) | From 9d66a12ff349daddbf758ab988deee570f2c73a1 Mon Sep 17 00:00:00 2001 From: Charlie Le Date: Thu, 16 Apr 2026 15:31:25 -0700 Subject: [PATCH 23/23] flaky-test-audit: skip TestVerticalShardingFuzz New flaky test from ci run 24536561113: non-deterministic fuzz test on arm64 with Docker container disappearance. Signed-off-by: Charlie Le Co-Authored-By: Claude Opus 4.6 (1M context) Signed-off-by: Charlie Le --- flaky-tests/TestVerticalShardingFuzz.md | 25 +++++++++++++++++++++++++ flaky-tests/audit-log.md | 1 + integration/query_fuzz_test.go | 1 + 3 files changed, 27 insertions(+) create mode 100644 flaky-tests/TestVerticalShardingFuzz.md diff --git a/flaky-tests/TestVerticalShardingFuzz.md b/flaky-tests/TestVerticalShardingFuzz.md new file mode 100644 index 0000000000..4b391e5eee --- /dev/null +++ b/flaky-tests/TestVerticalShardingFuzz.md @@ -0,0 +1,25 @@ +# Flaky Test: TestVerticalShardingFuzz + +**Status**: Skipped +**Occurrences**: 1 +**Root Cause**: Non-deterministic fuzz test. Generates random PromQL queries and compares results across sharded and non-sharded execution. Some randomly generated queries produce different results due to floating point precision or edge cases. Also accompanied by Docker container disappearance (`No such container: e2e-cortex-test-prometheus`). + +## Occurrences + +### 2026-04-16T22:19:31Z +- **Job**: [ci / integration (ubuntu-24.04-arm, arm64, integration_query_fuzz)](https://github.com/cortexproject/cortex/actions/runs/24536561113) +- **Package**: `github.com/cortexproject/cortex/integration` +- **File**: `integration/query_fuzz_test.go:1816` +- **Notes**: Failed on arm64. Passed on amd64 in the same run. + +
Build logs + +``` + Error Trace: /home/runner/work/cortex/cortex/integration/query_fuzz_test.go:1816 + Error: finished query fuzzing tests +--- FAIL: TestVerticalShardingFuzz (14.28s) +22:18:53 Error response from daemon: No such container: e2e-cortex-test-prometheus +FAIL github.com/cortexproject/cortex/integration 184.302s +``` + +
diff --git a/flaky-tests/audit-log.md b/flaky-tests/audit-log.md index 81821c0bf9..c2116cb814 100644 --- a/flaky-tests/audit-log.md +++ b/flaky-tests/audit-log.md @@ -24,3 +24,4 @@ This file tracks every CI run on the `flaky-test-audit` branch. Any test failure | 2026-04-16T21:22:00Z | PASS | All jobs passed (6th consecutive clean run) | [ci run 24533695679](https://github.com/cortexproject/cortex/actions/runs/24533695679) | | 2026-04-16T21:27:42Z | FLAKY | TestMinimizeSpreadTokenGenerator (pkg/ring) — floating point precision boundary (0.01097 > 0.01 threshold) | [ci run 24534699120](https://github.com/cortexproject/cortex/actions/runs/24534699120) | | 2026-04-16T22:08:00Z | PASS | All jobs passed | [ci run 24535673138](https://github.com/cortexproject/cortex/actions/runs/24535673138) | +| 2026-04-16T22:19:31Z | FLAKY | TestVerticalShardingFuzz (integration/query_fuzz, arm64) — non-deterministic fuzz + Docker container vanished | [ci run 24536561113](https://github.com/cortexproject/cortex/actions/runs/24536561113) | diff --git a/integration/query_fuzz_test.go b/integration/query_fuzz_test.go index 1ef5979f49..221a0f71a0 100644 --- a/integration/query_fuzz_test.go +++ b/integration/query_fuzz_test.go @@ -662,6 +662,7 @@ func TestExpandedPostingsCacheFuzz(t *testing.T) { } func TestVerticalShardingFuzz(t *testing.T) { + t.Skip("Flaky test auto-skipped. See flaky-tests/TestVerticalShardingFuzz.md") s, err := e2e.NewScenario(networkName) require.NoError(t, err) defer s.Close()