test(ci): worktree-multi-concurrent-e2e — add non-worktree base stack (3 → 4 concurrent stacks)#843
Conversation
… (3 → 4 concurrent stacks) Extends the multi-concurrent job to run a fourth stack alongside the three worktree ones: the plain non-worktree base-repo stack from openemr/docker/development-easy/ via `openemr-cmd up`. Uses the default port 8300 (no offset) — no conflict with worktree offsets 8301/02/03. What it newly verifies: - Worktree (3 stacks) + non-worktree (1 stack) coexist under concurrent load. No port collision, no compose-project name collision (`development-easy` vs `openemr-multi-*`), no daemon serialization deadlock. - HOST_UID adoption holds for BOTH modes simultaneously — the new "Verify bind mount is host-owned in BOTH modes" step stats sites/default/sqlconf.php across all four bind mounts and asserts ownership == runner uid. If adoption regresses in either mode, this catches it specifically; if it regresses ONLY under concurrency (e.g., race with HOST_UID export), this job is the only place we'd see it. - The non-worktree-only e2e job (nonworktree-lifecycle-e2e) covers the same ownership assertion in isolation; this job adds the concurrency dimension. Lifecycle: - Add 4th stack via `openemr-cmd up` from docker/development-easy/ after the three worktree adds - Expand healthy-poll to wait for all 4 containers - Expand HTTP smoke to include port 8300 - Expand cross-talk check to confirm 4 distinct container IDs - Tear down non-worktree first (via `openemr-cmd down -v`) to free daemon bandwidth for the worktree removes - Extend hygiene check to also verify `development-easy` project containers + volumes are gone (label-based filter for the volumes since the non-worktree project uses default volume names without the `openemr-multi-*_` prefix) Job name: "three concurrent worktrees" → "four concurrent stacks (3 worktrees + 1 non-worktree base)". Timeout bumped 40 → 50 min for the fourth parallel boot. Assisted-by: Claude Code
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughThe workflow job now runs four concurrent OpenEMR stacks: three worktree stacks and one ChangesConcurrent OpenEMR stack validation
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
…poser-install load The previous push had easy stack briefly flap to unhealthy at 260s (~80s past start_period) while still in the composer-install phase. The 3-stack version of this job had the same "bail on first unhealthy" logic and worked because 3 stacks always finished install inside start_period (3m); 4 stacks share CPU/disk more heavily and the install drags past start_period, allowing one transient healthcheck failure to trip the bail. Fix: require SUSTAINED unhealthy (≥6 consecutive polls = ~60s) before bailing. Single flaps recover on the next poll without killing the whole job. The 15-min poll timeout is still the final guard against genuinely-broken containers. Assisted-by: Claude Code
…eout to 30m + outer 60m Previous fix (sustained-unhealthy bail at 60s) wasn't enough — all four containers cascade into unhealthy state simultaneously during composer install under 4-way contention, all going past start_period together. The 60s threshold caught the cascade as soon as the streak hit it. Reality check from CI: each container's composer install takes ~15-20 min when 4 share CPU/disk/image-pull bandwidth (vs ~5 min single-stack baseline). During that long install the `/meta/health/readyz` probes fail because apache isn't fully serving yet. So "unhealthy for >60s" is the EXPECTED state, not a regression signal — bailing on it killed perfectly healthy progress. Final fix: - Remove the unhealthy-bail entirely. Just poll until all-four-healthy or the inner loop times out at 30 min (180 polls × 10s). - Outer job timeout bumped 50 → 60 min for the longer healthy-poll budget + downstream steps (smoke, exec, ownership, teardown, hygiene ~ 15 min total). - A genuinely-broken container is still caught: the 30-min inner timeout fires with a clear message + all four container logs. The 3-stack version still uses the original bail-on-first-unhealthy because 3 stacks DO finish install inside start_period (verified by its passing CI for months); only 4-stack contention breaks that assumption. Assisted-by: Claude Code
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
.github/workflows/test-bats-openemr-cmd-real-docker.yml (1)
704-706: 🎯 Functional Correctness | 🟡 Minor | ⚡ Quick winValidate worktree offsets before adding the
nonwt:0smoke target.If any
jq -roffset isnullor0, Bash arithmetic maps that worktree smoke probe to port8300, so it can hit the non-worktree stack and hide an offset regression.Proposed fix
oe=$(jq -r '."multi-easy".offset' .worktrees.json) ol=$(jq -r '."multi-light".offset' .worktrees.json) or=$(jq -r '."multi-redis".offset' .worktrees.json) + for branch_offset in "multi-easy:${oe}" "multi-light:${ol}" "multi-redis:${or}"; do + branch="${branch_offset%%:*}" + offset="${branch_offset#*:}" + [[ "${offset}" =~ ^[1-9][0-9]*$ ]] || { echo "FAIL: ${branch} offset is not a positive integer: '${offset}'"; exit 1; } + done + offset_set=$(printf '%s\n%s\n%s\n' "${oe}" "${ol}" "${or}" | sort -n | paste -sd, -) + [[ "${offset_set}" = "1,2,3" ]] || { echo "FAIL: expected worktree offsets 1,2,3; got ${offset_set}"; exit 1; } for entry in "easy:${oe}" "light:${ol}" "redis:${or}" "nonwt:0"; do🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.github/workflows/test-bats-openemr-cmd-real-docker.yml around lines 704 - 706, The smoke-target loop is using raw offsets in the port calculation, so a null or zero worktree offset can collapse to the same port as the non-worktree stack. Update the workflow logic around the worktree offset extraction and the entry loop so offsets are validated before use, and ensure the `nonwt` probe is kept separate from any worktree-based ports. Use the `jq -r` offset handling and the `for entry in ...` block to guard against null/0 values before computing `port`.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In @.github/workflows/test-bats-openemr-cmd-real-docker.yml:
- Around line 674-684: The health-check loop in the workflow currently keeps
retrying even when docker inspect returns missing for any of the containers,
which should be treated as an immediate failure. Update the loop around the
container status checks for cname_e, cname_l, cname_r, and cname_n so that
missing container names fail fast with a clear error, while still continuing to
retry only when the status is unhealthy. Keep the existing success path in the
same health-wait section and preserve the current container status reporting.
---
Outside diff comments:
In @.github/workflows/test-bats-openemr-cmd-real-docker.yml:
- Around line 704-706: The smoke-target loop is using raw offsets in the port
calculation, so a null or zero worktree offset can collapse to the same port as
the non-worktree stack. Update the workflow logic around the worktree offset
extraction and the entry loop so offsets are validated before use, and ensure
the `nonwt` probe is kept separate from any worktree-based ports. Use the `jq
-r` offset handling and the `for entry in ...` block to guard against null/0
values before computing `port`.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: d7e8dc53-dc58-4a5d-bda0-7a44506f0578
📒 Files selected for processing (1)
.github/workflows/test-bats-openemr-cmd-real-docker.yml
…erally absent) CodeRabbit catch: the no-bail rule for `unhealthy` was right (slow composer install under 4-way contention transiently flaps probes), but it ALSO swallowed `missing` — the container literally not existing — which is a hard failure not worth waiting 30 min on. Cause analysis matrix: - `missing` = compose failed to create the container, or the name/ project naming contract changed. No amount of waiting fixes it. Bail immediately with `docker ps -a` dump. - `starting` / `unhealthy` = slow probe results during install. Keep polling — this is the expected transient state under load. - `healthy` (all 4) = ready, exit 0. Compose up -d returns after creating containers, so by iteration 1 of this poll all four should exist; missing at iter 1+ means something broke at the orchestration layer, not the application layer. Assisted-by: Claude Code
Summary
Extends
worktree-multi-concurrent-e2efrom 3 stacks to 4 by adding the plain non-worktree base-repo stack (openemr-cmd upfromopenemr/docker/development-easy/) alongside the three worktree stacks.No port or project-name conflict by design.
What this newly verifies
sites/default/sqlconf.phpacross all 4 bind mounts and asserts ownership == runner uid. If adoption regresses only under concurrency (e.g., HOST_UID export race), this job catches itdevelopment-easyproject's containers + volumes are also gone (label-based filter for non-worktree volumes since they use default names without theopenemr-multi-*_prefix)Lifecycle delta vs prior 3-stack version
openemr-cmd upstep fromdocker/development-easy/after the 3 worktree addsopenemr-cmd down(with volumes) for the non-worktree stack first (frees daemon bandwidth for the subsequent worktree removes)development-easycompose project tooCost
Timeout bumped 40 → 50 min for the fourth parallel boot. Public-repo runner so no $ cost; the only real cost is reviewer attention if a regression specifically surfaces under 4-way concurrency (which is the signal we want).
The standalone
nonworktree-lifecycle-e2ejob still exists and covers the non-worktree path in isolation — this PR adds the concurrency dimension, not a replacement.Test plan
🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
Bug Fixes