fix(deploy): remove API_NUM_WORKERS footgun, scale via Ray Serve#501
Conversation
The uvicorn deployment path fed API_NUM_WORKERS into `uvicorn --workers N`, but the app calls ray.init() at import time, so each extra worker starts its own isolated Ray cluster with duplicate named actors (Indexer, Vectordb, TaskStateManager), fragmenting task state and vector-DB access. The flag was silently ignored until v1.1.12 because the entrypoint always passed --reload (which forces a single uvicorn worker); gating --reload behind UVICORN_RELOAD=true (PR #478, N8) unmasked it. - entrypoint.sh: always run a single uvicorn worker; warn if API_NUM_WORKERS is set to a non-1 value, pointing operators to Ray Serve. - charts: drop the dead API_NUM_WORKERS: "8" (the chart runs Ray Serve, which takes the api.py branch and never reads it). - .env.example / docs: remove the knob and document Ray Serve (ENABLE_RAY_SERVE + RAY_SERVE_NUM_REPLICAS) as the HTTP scaling path. Closes #500
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
✅ Files skipped from review due to trivial changes (2)
📝 WalkthroughWalkthroughRemoves ChangesSingle Worker Enforcement and Documentation
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@entrypoint.sh`:
- Line 31: The uvicorn port binding uses an incorrect environment variable name
APP_iPORT (with a lowercase 'i') instead of APP_PORT (with a capital 'P').
Change the environment variable reference in the uv run command from APP_iPORT
to APP_PORT so that the correct application port environment variable is
recognized and used instead of always falling back to the default port 8080.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 1f02c847-0de0-4cc6-9acb-510604375ffd
📒 Files selected for processing (4)
.env.examplecharts/openrag-stack/values.yamldocs/content/docs/documentation/env_vars.mdentrypoint.sh
.env.example removed the API_NUM_WORKERS knob, but its two hand-maintained mirrors under docs/assets/ (env_example.env, env_linux_gpu.env) still advertised it with the old, now-incorrect description. These files are embedded in the quickstart docs, so users following them would still copy the retired knob. Apply the same comment as .env.example pointing to the Ray Serve scaling path.
2ca0c6f to
319bc5c
Compare
What & why
API_NUM_WORKERSis a footgun. In the uvicorn deployment path it feduvicorn --workers N, but the app callsray.init()at import time (openrag/api.py), so every uvicorn worker is a separate process that starts its own isolated Ray cluster with duplicate named actors (Indexer,Vectordb,TaskStateManager, the loader pools). Anything> 1silently breaks the shared-actor architecture: task state fragments across clusters, multipleIndexer/Vectordbactors contend on the same Milvus collection / Postgres DB, and resources multiply N×.Why it surfaced now
It was a no-op until v1.1.12. The entrypoint used to always pass
--reload, and uvicorn dispatchesshould_reloadbeforeworkers > 1, so--reloadforced a single worker regardless of the value. Gating--reloadbehindUVICORN_RELOAD=true(#478, "N8") unmasked the setting — now all N workers actually start.Full write-up in #500.
Changes
entrypoint.sh— the uvicorn path always runs a single worker (--workers 1). IfAPI_NUM_WORKERSis set to a non-1value, it logs a warning pointing operators to Ray Serve instead of silently misbehaving.charts/openrag-stack/values.yaml— removed the deadAPI_NUM_WORKERS: "8". The chart setsENABLE_RAY_SERVE: "true", which takes theapi.pybranch in the entrypoint and never readsAPI_NUM_WORKERS— so it was misleading, not live..env.example/docs/.../env_vars.md— dropped the knob and documented the correct scaling path.The correct way to scale
Scale the HTTP layer with Ray Serve — N replicas inside one shared Ray cluster, so the named actors stay singletons:
This is already what the Helm chart does. For multi-node, see the Ray cluster deployment guide.
Notes
API_NUM_WORKERS > 1will now correctly run one worker and print a warning.Closes #500
Summary by CodeRabbit
Documentation
API_NUM_WORKERS..env.examplecomments directing users to scale usingENABLE_RAY_SERVE=trueandRAY_SERVE_NUM_REPLICAS.Chores
API_NUM_WORKERSis set to a value other than"1", the app now warns that it’s ignored.