Update test to sleep 0.5s between component new and component delete#1368
Conversation
component new and component deletecomponent new and component delete
|
See https://github.com/projectsyn/commodore/actions/runs/25909921786/job/76152183076?pr=1367 for a test failure in CI |
We seem to have a race condition between the component Git repo getting initialized by `component new` and deleted by `component delete`. Without the sleep, the test fails at least one in ten times (initial local numbers are between 9/100 and 24/100 for separate runs). The failure seems to be caused because something apparently is still creating files in the bare component checkout while `component delete` is deleting the bare checkout directory. We observe a slightly increased number of test failures increases (locally: approx 1-2 / 100) when we assert that the bare checkout is fully deleted after `component delete`, in some cases `shutil.rmtree()` apparently succeeds but the directory remains. We've landed on 0.5s sleep after trying a couple numbers. We've seen no failures in ~8000 consecutive test runs with 0.5s sleep. For CI, a single 0.5s delay in one test (that already takes 1.7-1.8s in isolation without the sleep) out of >1000 should be acceptable. Note that we don't actually understand the root cause here, but the failing test is a synthetic sequence of commands that should almost never happen in real usage without at least 1s delay between them. Additionally note that the flakiness disappears when we switch the test to `subprocess.call()` instead of click's `CliRunner` even without the sleep( or at least is rare enough that it doesn't show up in >1000 runs).
dfb86bc to
6d164b5
Compare
|
Update: given the frequency of failing tests for new dependency PRs, there must be some dependency change that causes the test to flake. I don't see anything in recent GitPython changes (3.1.46..3.1.50). Notably, on my Laptop with Git 2.43.0 (from the default Ubuntu 24.04 package), I can't seem to reproduce the flaking, so this might be caused by a change introduced by a more recent Git version (since I'm running latest mainline Git from the Git stable releases PPA on my desktop). Update 2: After updating Git to 2.54.0 from the Git stable releases PPA on my Laptop, I now see 193/250 runs fail without the fix. |
|
While it might be interesting to pinpoint the Git change (maybe in combination with Click CliRunner) change that causes the flaking, it's probably not worth the effort since we can reliably avoid the flaking with just 0.5s sleep. |
We seem to have a race condition between the component Git repo getting initialized by
component newand deleted bycomponent delete.Without the sleep, the test fails at least one in ten times (initial local numbers are between 9/100 and 24/100 for separate runs). The failure seems to be caused because something apparently is still creating files in the bare component checkout while
component deleteis deleting the bare checkout directory.We observe a slightly increased number of test failures increases (locally: approx 1-2 / 100) when we assert that the bare checkout is fully deleted after
component delete, in some casesshutil.rmtree()apparently succeeds but the directory remains.We've landed on 0.5s sleep after trying a couple numbers. We've seen no failures in ~8000 consecutive test runs with 0.5s sleep. For CI, a single 0.5s delay in one test (that already takes 1.7-1.8s in isolation without the sleep) out of >1000 should be acceptable.
Note that we don't actually understand the root cause here, but the failing test is a synthetic sequence of commands that should almost never happen in real usage without at least 1s delay between them.
Additionally note that the flakiness disappears when we switch the test to
subprocess.call()instead of click'sCliRunnereven without the sleep( or at least is rare enough that it doesn't show up in >1000 runs).Checklist
bug,enhancement,documentation,change,breaking,dependency,internalas they show up in the changelog