Problem
tests/mock_vws/test_docker.py::test_build_and_run occasionally fails with:
ValueError: Container vws-mock-target-manager-... is not healthy: unhealthy
When this happens, the only information the test surfaces is the final health_status string. We have no visibility into:
- The container's stdout/stderr (did Flask crash? did it never start listening?)
- Docker's
State.Health.Log entries (what did each healthcheck probe actually return?)
This makes it impossible to tell whether the failure is a flake (slow startup on a loaded runner exhausting Docker's internal healthcheck retries) or a real bug in the image.
Proposal
When wait_for_health_check exhausts its retries in this test, before raising, dump:
container.logs().decode() for each container
container.attrs["State"]["Health"]["Log"] (each probe's exit code + output)
Attach them to the assertion message so CI logs contain enough information to diagnose the failure on the next occurrence.
Context
Seen on https://github.com/VWS-Python/vws-python-mock/actions/runs/25480041288/job/74761924239 (PR #3146). Re-run passed, consistent with flakiness, but we should make the next failure actionable.
Problem
tests/mock_vws/test_docker.py::test_build_and_runoccasionally fails with:When this happens, the only information the test surfaces is the final
health_statusstring. We have no visibility into:State.Health.Logentries (what did each healthcheck probe actually return?)This makes it impossible to tell whether the failure is a flake (slow startup on a loaded runner exhausting Docker's internal healthcheck retries) or a real bug in the image.
Proposal
When
wait_for_health_checkexhausts its retries in this test, before raising, dump:container.logs().decode()for each containercontainer.attrs["State"]["Health"]["Log"](each probe's exit code + output)Attach them to the assertion message so CI logs contain enough information to diagnose the failure on the next occurrence.
Context
Seen on https://github.com/VWS-Python/vws-python-mock/actions/runs/25480041288/job/74761924239 (PR #3146). Re-run passed, consistent with flakiness, but we should make the next failure actionable.