Skip to content

fix(deploy): remove API_NUM_WORKERS footgun, scale via Ray Serve#501

Merged
Ahmath-Gadji merged 2 commits into
mainfrom
fix/api-num-workers-footgun
Jun 18, 2026
Merged

fix(deploy): remove API_NUM_WORKERS footgun, scale via Ray Serve#501
Ahmath-Gadji merged 2 commits into
mainfrom
fix/api-num-workers-footgun

Conversation

@Ahmath-Gadji

@Ahmath-Gadji Ahmath-Gadji commented Jun 17, 2026

Copy link
Copy Markdown
Collaborator

What & why

API_NUM_WORKERS is a footgun. In the uvicorn deployment path it fed uvicorn --workers N, but the app calls ray.init() at import time (openrag/api.py), so every uvicorn worker is a separate process that starts its own isolated Ray cluster with duplicate named actors (Indexer, Vectordb, TaskStateManager, the loader pools). Anything > 1 silently breaks the shared-actor architecture: task state fragments across clusters, multiple Indexer/Vectordb actors contend on the same Milvus collection / Postgres DB, and resources multiply N×.

Why it surfaced now

It was a no-op until v1.1.12. The entrypoint used to always pass --reload, and uvicorn dispatches should_reload before workers > 1, so --reload forced a single worker regardless of the value. Gating --reload behind UVICORN_RELOAD=true (#478, "N8") unmasked the setting — now all N workers actually start.

Full write-up in #500.

Changes

  • entrypoint.sh — the uvicorn path always runs a single worker (--workers 1). If API_NUM_WORKERS is set to a non-1 value, it logs a warning pointing operators to Ray Serve instead of silently misbehaving.
  • charts/openrag-stack/values.yaml — removed the dead API_NUM_WORKERS: "8". The chart sets ENABLE_RAY_SERVE: "true", which takes the api.py branch in the entrypoint and never reads API_NUM_WORKERS — so it was misleading, not live.
  • .env.example / docs/.../env_vars.md — dropped the knob and documented the correct scaling path.

The correct way to scale

Scale the HTTP layer with Ray Serve — N replicas inside one shared Ray cluster, so the named actors stay singletons:

ENABLE_RAY_SERVE=true
RAY_SERVE_NUM_REPLICAS=4

This is already what the Helm chart does. For multi-node, see the Ray cluster deployment guide.

Notes

  • No behavior change for existing single-worker deployments (the common case). Deployments that set API_NUM_WORKERS > 1 will now correctly run one worker and print a warning.
  • No migration needed.

Closes #500

Summary by CodeRabbit

  • Documentation

    • Clarified that HTTP scaling is controlled via Ray Serve replicas (with sample Ray Serve configuration) rather than Uvicorn worker counts.
    • Updated environment variable documentation and example env files to reflect single Uvicorn worker behavior and removed guidance for API_NUM_WORKERS.
    • Added chart and .env.example comments directing users to scale using ENABLE_RAY_SERVE=true and RAY_SERVE_NUM_REPLICAS.
  • Chores

    • Enforced running exactly one Uvicorn worker when not using Ray Serve.
    • If API_NUM_WORKERS is set to a value other than "1", the app now warns that it’s ignored.

The uvicorn deployment path fed API_NUM_WORKERS into `uvicorn --workers N`,
but the app calls ray.init() at import time, so each extra worker starts its
own isolated Ray cluster with duplicate named actors (Indexer, Vectordb,
TaskStateManager), fragmenting task state and vector-DB access.

The flag was silently ignored until v1.1.12 because the entrypoint always
passed --reload (which forces a single uvicorn worker); gating --reload behind
UVICORN_RELOAD=true (PR #478, N8) unmasked it.

- entrypoint.sh: always run a single uvicorn worker; warn if API_NUM_WORKERS
  is set to a non-1 value, pointing operators to Ray Serve.
- charts: drop the dead API_NUM_WORKERS: "8" (the chart runs Ray Serve, which
  takes the api.py branch and never reads it).
- .env.example / docs: remove the knob and document Ray Serve
  (ENABLE_RAY_SERVE + RAY_SERVE_NUM_REPLICAS) as the HTTP scaling path.

Closes #500
@coderabbitai

coderabbitai Bot commented Jun 17, 2026

Copy link
Copy Markdown

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: b2d193aa-924b-479a-879b-b6e427792979

📥 Commits

Reviewing files that changed from the base of the PR and between 2ca0c6f and 319bc5c.

📒 Files selected for processing (2)
  • docs/assets/env_example.env
  • docs/assets/env_linux_gpu.env
✅ Files skipped from review due to trivial changes (2)
  • docs/assets/env_example.env
  • docs/assets/env_linux_gpu.env

📝 Walkthrough

Walkthrough

Removes API_NUM_WORKERS multi-worker support from the uvicorn startup path. entrypoint.sh now unconditionally passes --workers 1 to uvicorn and emits a stderr warning when API_NUM_WORKERS is set to a non-1 value. Related comments are added to .env.example and charts/openrag-stack/values.yaml, and env_vars.md is updated to document Ray Serve as the correct HTTP scaling path and remove the API_NUM_WORKERS entry.

Changes

Single Worker Enforcement and Documentation

Layer / File(s) Summary
Enforce single uvicorn worker and warn on API_NUM_WORKERS
entrypoint.sh
The non-Ray-Serve startup path always runs uvicorn with --workers 1. If API_NUM_WORKERS is set to any value other than "1", a warning is printed to stderr that it is ignored. The previous conditional --workers ${API_NUM_WORKERS} mapping is removed.
Update env examples, Helm values, and env-vars docs
.env.example, charts/openrag-stack/values.yaml, docs/content/docs/documentation/env_vars.md, docs/assets/env_example.env, docs/assets/env_linux_gpu.env
Adds inline comments to environment example files and values.yaml clarifying the single-worker design and pointing to ENABLE_RAY_SERVE/RAY_SERVE_NUM_REPLICAS. Expands the Ray Serve section in env_vars.md with actor-initialization rationale, a sample config snippet, and a link to distributed Ray cluster docs. Removes the API_NUM_WORKERS row from the FastAPI table.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Poem

🐇 One worker, no more, no less—
The Ray actors share a single nest.
API_NUM_WORKERS? A warning now rings,
"Use Ray Serve replicas for scaling things!"
Hoppity-fix, the footgun is gone! 🎉

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title accurately summarizes the main change: removing API_NUM_WORKERS and establishing Ray Serve as the scaling mechanism.
Linked Issues check ✅ Passed All changes directly address issue #500's objectives: single-worker enforcement in entrypoint.sh, warning on API_NUM_WORKERS mismatch, removal from Helm chart and documentation, and Ray Serve scaling guidance.
Out of Scope Changes check ✅ Passed All changes are in-scope: entrypoint.sh, environment files, Helm values, and documentation updates align with the stated PR objectives and issue requirements.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/api-num-workers-footgun

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@entrypoint.sh`:
- Line 31: The uvicorn port binding uses an incorrect environment variable name
APP_iPORT (with a lowercase 'i') instead of APP_PORT (with a capital 'P').
Change the environment variable reference in the uv run command from APP_iPORT
to APP_PORT so that the correct application port environment variable is
recognized and used instead of always falling back to the default port 8080.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 1f02c847-0de0-4cc6-9acb-510604375ffd

📥 Commits

Reviewing files that changed from the base of the PR and between 7c2a35b and 0e5687b.

📒 Files selected for processing (4)
  • .env.example
  • charts/openrag-stack/values.yaml
  • docs/content/docs/documentation/env_vars.md
  • entrypoint.sh

Comment thread entrypoint.sh
.env.example removed the API_NUM_WORKERS knob, but its two hand-maintained
mirrors under docs/assets/ (env_example.env, env_linux_gpu.env) still
advertised it with the old, now-incorrect description. These files are
embedded in the quickstart docs, so users following them would still copy
the retired knob. Apply the same comment as .env.example pointing to the
Ray Serve scaling path.
@Ahmath-Gadji Ahmath-Gadji force-pushed the fix/api-num-workers-footgun branch from 2ca0c6f to 319bc5c Compare June 17, 2026 15:42
@Ahmath-Gadji Ahmath-Gadji merged commit 1b189a2 into main Jun 18, 2026
4 checks passed
@Ahmath-Gadji Ahmath-Gadji deleted the fix/api-num-workers-footgun branch June 18, 2026 07:41
@Ahmath-Gadji Ahmath-Gadji added the fix Fix issue label Jun 18, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

fix Fix issue

Projects

None yet

Development

Successfully merging this pull request may close these issues.

API_NUM_WORKERS is a footgun: spawns N isolated Ray clusters (and was silently ignored before v1.1.12)

1 participant