Skip to content

Update test to sleep 0.5s between component new and component delete#1368

Merged
simu merged 2 commits into
masterfrom
fix/flaky-test-component-new-then-delete
May 18, 2026
Merged

Update test to sleep 0.5s between component new and component delete#1368
simu merged 2 commits into
masterfrom
fix/flaky-test-component-new-then-delete

Conversation

@simu
Copy link
Copy Markdown
Member

@simu simu commented May 15, 2026

We seem to have a race condition between the component Git repo getting initialized by component new and deleted by component delete.

Without the sleep, the test fails at least one in ten times (initial local numbers are between 9/100 and 24/100 for separate runs). The failure seems to be caused because something apparently is still creating files in the bare component checkout while component delete is deleting the bare checkout directory.

We observe a slightly increased number of test failures increases (locally: approx 1-2 / 100) when we assert that the bare checkout is fully deleted after component delete, in some cases shutil.rmtree() apparently succeeds but the directory remains.

We've landed on 0.5s sleep after trying a couple numbers. We've seen no failures in ~8000 consecutive test runs with 0.5s sleep. For CI, a single 0.5s delay in one test (that already takes 1.7-1.8s in isolation without the sleep) out of >1000 should be acceptable.

Note that we don't actually understand the root cause here, but the failing test is a synthetic sequence of commands that should almost never happen in real usage without at least 1s delay between them.

Additionally note that the flakiness disappears when we switch the test to subprocess.call() instead of click's CliRunner even without the sleep( or at least is rare enough that it doesn't show up in >1000 runs).

Checklist

  • Keep pull requests small so they can be easily reviewed.
  • Update tests.
  • Categorize the PR by setting a good title and adding one of the labels:
    bug, enhancement, documentation, change, breaking, dependency, internal
    as they show up in the changelog

@simu simu requested a review from a team as a code owner May 15, 2026 15:42
@simu simu added the internal Internal changes which don't affect users but should appear in the changelog label May 15, 2026
@simu simu changed the title Sleep 0.5s between component new and component delete Update test to sleep 0.5s between component new and component delete May 15, 2026
@simu
Copy link
Copy Markdown
Member Author

simu commented May 15, 2026

@simu simu requested a review from a team May 15, 2026 15:44
We seem to have a race condition between the component Git repo getting
initialized by `component new` and deleted by `component delete`.

Without the sleep, the test fails at least one in ten times (initial
local numbers are between 9/100 and 24/100 for separate runs). The
failure seems to be caused because something apparently is still
creating files in the bare component checkout while `component delete`
is deleting the bare checkout directory.

We observe a slightly increased number of test failures increases
(locally: approx 1-2 / 100) when we assert that the bare checkout is
fully deleted after `component delete`, in some cases `shutil.rmtree()`
apparently succeeds but the directory remains.

We've landed on 0.5s sleep after trying a couple numbers. We've seen no
failures in ~8000 consecutive test runs with 0.5s sleep. For CI, a
single 0.5s delay in one test (that already takes 1.7-1.8s in isolation
without the sleep) out of >1000 should be acceptable.

Note that we don't actually understand the root cause here, but the
failing test is a synthetic sequence of commands that should almost
never happen in real usage without at least 1s delay between them.

Additionally note that the flakiness disappears when we switch the test
to `subprocess.call()` instead of click's `CliRunner` even without the
sleep( or at least is rare enough that it doesn't show up in >1000
runs).
@simu simu force-pushed the fix/flaky-test-component-new-then-delete branch from dfb86bc to 6d164b5 Compare May 15, 2026 16:30
@simu
Copy link
Copy Markdown
Member Author

simu commented May 18, 2026

Update: given the frequency of failing tests for new dependency PRs, there must be some dependency change that causes the test to flake. I don't see anything in recent GitPython changes (3.1.46..3.1.50).

Notably, on my Laptop with Git 2.43.0 (from the default Ubuntu 24.04 package), I can't seem to reproduce the flaking, so this might be caused by a change introduced by a more recent Git version (since I'm running latest mainline Git from the Git stable releases PPA on my desktop).

Update 2: After updating Git to 2.54.0 from the Git stable releases PPA on my Laptop, I now see 193/250 runs fail without the fix.

@simu
Copy link
Copy Markdown
Member Author

simu commented May 18, 2026

While it might be interesting to pinpoint the Git change (maybe in combination with Click CliRunner) change that causes the flaking, it's probably not worth the effort since we can reliably avoid the flaking with just 0.5s sleep.

@simu simu merged commit 83f6105 into master May 18, 2026
28 checks passed
@simu simu deleted the fix/flaky-test-component-new-then-delete branch May 18, 2026 09:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

internal Internal changes which don't affect users but should appear in the changelog

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants