feat: local-agent bridge with pluggable runtime adapters#452
feat: local-agent bridge with pluggable runtime adapters#452TatsuKo-Tsukimi wants to merge 15 commits intodataelement:mainfrom
Conversation
Adds a reverse-WebSocket bridge that lets users run local agents
(Claude Code, OpenClaw daemon, Hermes) from their own machine while
keeping the Clawith platform as the control plane.
Server
- /ws/bridge endpoint (backend/app/api/bridge_ws.py): per-agent WS
channel; bridge advertises available adapters, server dispatches
sessions via SessionStartFrame.adapter.
- local_agent/ service package: adapter registry, protocol frames,
session dispatcher.
- Installer endpoint on /agents/{id}/bridge-installer: renders a
platform-specific installer with the agent's API key baked in.
Windows returns a single pristine clawith-bridge.exe + JSON config
trailer (magic CLWB!END); Unix returns a bash script that installs
the pip package and registers launchd/systemd.
- agent_type='openclaw' now carries a bridge_adapter field
(claude_code | openclaw | hermes) chosen at creation time. Server
strictly matches the bridge's advertised adapters and fails with
a clear error otherwise, instead of silently falling back.
Bridge package (bridge/)
- clawith_bridge: Python package with three adapters and a reverse-WS
session manager.
- PyInstaller spec that produces a single clawith-bridge.exe.
- baked_config trailer reader: on first launch from the
setup.exe, extracts server/token/adapter, runs the install flow,
strips the trailer from the installed copy, and registers a user
scheduled task.
- install_windows.py: copies the exe to %LOCALAPPDATA%/Clawith/bin/,
writes ~/.clawith-bridge.toml with only the chosen adapter enabled,
registers a user-scope scheduled task that auto-starts at logon.
Frontend
- AgentCreate: 4 flat runtime cards (Platform Hosted / Claude Code /
OpenClaw / Hermes). Non-native cards set agent_type='openclaw' with
different bridge_adapter values.
- AgentDetail: badge shows "Bridge . <Runtime> . Lab".
- OpenClawSettings: bridge installer download block with a readonly
Runtime line so users can confirm which adapter their installer
will enable.
- i18n keys for wizard.runtime.* and wizard.bridge.* in en/zh.
Migrations
- add_bridge_mode: opt-in flag for bridge-style agents.
- add_bridge_adapter: new column, backfills existing openclaw agents
to claude_code (matches current de-facto behavior).
Build
- bridge/clawith-bridge.exe is ignored and rebuilt via
pyinstaller clawith-bridge.spec, then copied into
backend/app/static/bridge/ before shipping.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
session_dispatcher.py logs these two events when a bridge connects or disconnects, but the enum never included them — so every attach/detach raised InvalidTextRepresentationError in logs and the activity row was dropped. Adds both values via idempotent ALTER TYPE ... ADD VALUE IF NOT EXISTS, matching the pattern used in add_agentbay_enum_value. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: dc91ed8e39
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if getattr(agent, "bridge_mode", "disabled") == "disabled": | ||
| agent.bridge_mode = "enabled" | ||
|
|
||
| await db.commit() |
There was a problem hiding this comment.
Defer API key rotation until installer payload is built
Committing the regenerated key before calling render_installer means a failed download (for example, Windows binary missing and render_installer raising FileNotFoundError) still invalidates the currently running bridge token. In that failure path the API returns 503, but the old key is already revoked, so operators lose connectivity without receiving a usable installer. Move the commit after successful payload generation (or roll back on generation errors) so failed downloads are side-effect free.
Useful? React with 👍 / 👎.
| bridge_adapter=( | ||
| (data.bridge_adapter or "claude_code") | ||
| if (data.agent_type or "native") == "openclaw" | ||
| else None |
There was a problem hiding this comment.
Validate bridge adapter before persisting OpenClaw agents
The create flow persists data.bridge_adapter directly for OpenClaw agents without constraining it to supported adapters. A client can store any string, which later causes chat dispatch to fail every turn with "Selected runtime ... is not available" while installer generation quietly falls back to claude_code, creating a persistent runtime mismatch for that agent. Add server-side validation/enumeration at creation time to reject unknown adapter values.
Useful? React with 👍 / 👎.
…m gaps
Adds a live bridge-connection indicator so users can tell at a glance
whether their local bridge is running, instead of discovering offline
state by failing to chat.
Backend
- GET /agents/{id}/bridge-status: reuses check_agent_access, returns
{connected, applicable, bridge_version, adapters, connected_at,
active_sessions}. Non-bridge agents return applicable=false.
Frontend
- agentApi.bridgeStatus() client method.
- AgentDetail polls every 5s (only for agent_type=openclaw, only
while page is foregrounded) and renders a colored dot inside the
existing runtime badge — green when connected, red when offline,
transparent while the first request is in flight.
- Tooltip shows bridge version + active session count when connected,
and a hint to install/start the bridge when offline.
Enum sweep (fix)
- Also adds local_session_start / local_session_done /
local_session_error / reverse_tool_call / reverse_tool_result /
bridge_installer_download to activity_action_enum. These were all
referenced from bridge code but missing from the enum, so every
bridge session was silently losing its audit trail.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
After creating an openclaw agent the runtime was frozen: even though the
server/bridge supports claude_code/openclaw/hermes at runtime, the only
way to change adapter was to delete the agent and re-create it. This
change exposes runtime selection on the agent's OpenClaw Settings page.
- schemas.AgentUpdate: add bridge_adapter: str | None with regex guard
- agents.update_agent: silently drop bridge_adapter for non-openclaw
agents instead of 422'ing (idempotent if UI sends it for native)
- OpenClawSettings.tsx: replace readonly runtime badge with a 3-button
selector; on change, PATCH /agents/{id}, invalidate cache, and show a
yellow notice telling the user to redownload the installer OR edit
~/.clawith-bridge.toml to set enabled=true under [<new_adapter>] and
restart the bridge. The server side of the handshake picks up the new
adapter on next SessionStart regardless of which path the user takes.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…fail
When the installed bridge advertises an adapter different from what the
agent is configured for, server-side dispatch rejects with a chat-time
error. That's correct but late — users only discover the problem by
failing to chat, and the remediation (redownload or edit TOML) lives in
a chat bubble they likely dismissed.
Two changes, both driven by /agents/{id}/bridge-status polling:
- OpenClawSettings: RuntimeSelector now accepts the live bridge_status
and renders a red banner whenever bridge.adapters doesn't include the
selected runtime. The banner lists what the bridge actually advertises,
points at the installer below, and shows the TOML escape hatch. The
previous "justChanged" yellow banner is kept but only fires when the
bridge is offline at the moment of switching (no live signal to rely on).
- AgentDetail bridge badge: three-state dot. Green = online + match,
yellow = online + mismatch (with amber glow), red = offline. Tooltip
spells out the mismatch and points at Settings. The prior two-state
(green/red) would silently light green even when the agent's adapter
was unreachable.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 64cb73d2b6
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| r.cancel() | ||
| watcher.cancel() | ||
| try: | ||
| await proc.wait() |
There was a problem hiding this comment.
Kill timed-out subprocess before waiting for exit
When SubprocessAdapter.start_session hits its timeout path, it breaks out of the event loop but then unconditionally waits on proc.wait() without terminating the child first. If the underlying CLI keeps running after output stalls, the bridge session can hang instead of completing, consuming a session slot and delaying session.error/done until a separate outer timeout cancels it. Ensure the process is terminated (or cancel() is invoked) before this wait so timeout_s actually bounds execution.
Useful? React with 👍 / 👎.
| @router.get("/api/admin/bridge/status") | ||
| async def bridge_status(): |
There was a problem hiding this comment.
Enforce admin auth on bridge status endpoint
This route is exposed as /api/admin/bridge/status but has no authentication/authorization dependency, so unauthenticated callers can list connected bridge metadata (agent IDs, adapters, versions, active sessions). That leaks internal runtime topology and agent identifiers from what is clearly intended as an admin surface; it should be protected by the existing admin auth path before being publicly routable.
Useful? React with 👍 / 👎.
Two issues from external review on the bridge PR:
- Remove GET /api/admin/bridge/status from bridge_ws.py. It had no auth
enforcement and exposed every connected bridge's agent_id, version,
and active sessions to anyone who could reach the backend. The
comment admitted "V1 doesn't do auth, add later" — 'later' was never
going to be safer than now. The per-agent /agents/{id}/bridge-status
endpoint added with the bridge status badge already covers the
legitimate use case and is auth-gated.
- Require PUBLIC_BASE_URL for bridge-installer downloads instead of
falling back to the request's Host / X-Forwarded-Host header. An
authenticated attacker could set those to an attacker-controlled
hostname and have the installer bake that into the bridge's
dial-home URL. Validation now runs before the DB mutation, so a
misconfiguration doesn't rotate the API key and leave the agent
wedged. .env.example calls out the requirement explicitly.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Three focused test files covering the code paths most likely to silently regress the bridge feature: - test_bridge_installer_template.py: render_installer(adapter=...) with each of claude_code/openclaw/hermes produces a TOML section block with exactly that adapter's `enabled = true` and the others `false`; unknown adapters fall back to claude_code; agent_name with CR/LF is flattened so it can't escape the `# Agent:` comment line. - test_agent_schema_bridge_adapter.py: AgentCreate/AgentUpdate reject typos, wrong case, wrong separators, and injection-style values. - test_bridge_migrations_idempotent.py: loads each enum migration by file path (alembic/versions isn't a package), mocks op.execute, and asserts every emitted statement carries ADD VALUE IF NOT EXISTS — the clause that lets the backend boot re-run migrations safely on every container restart. Also cross-checks that every bridge action type logged by the session code is present in the enum. Also adds the same regex constraint to AgentCreate.bridge_adapter that was already on AgentUpdate.bridge_adapter — previously Create accepted arbitrary strings, so a typo could end up in the DB and only surface at chat-time when the dispatcher couldn't find a matching adapter. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: dfe781f76d
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| except FileNotFoundError as e: | ||
| yield SessionEvent(kind="stderr_chunk", payload={"text": f"{argv[0]!r} not found: {e}"}) | ||
| return |
There was a problem hiding this comment.
Surface missing adapter executable as session error
When a local runtime binary is missing (for example, claude not installed), this branch emits a stderr_chunk and returns normally, which makes SessionManager._run treat the run as successful and send session.done with exit_code 0 instead of session.error. In practice, bridge startup misconfiguration is reported as a blank/"successful" turn, so operators get no reliable failure signal. This path should raise or otherwise propagate a terminal error state so the session is marked failed.
Useful? React with 👍 / 👎.
| } else if (data.role && data.content) { | ||
| // Legacy format: {role, content} | ||
| setMessages(prev => [...prev, { role: data.role, content: data.content }]); | ||
| } else { | ||
| // Unknown event with no role/content — e.g. Anthropic SDK |
There was a problem hiding this comment.
Render websocket error frames instead of dropping them
The new bridge flow sends explicit {"type":"error","content":...} frames for runtime mismatch, offline bridge, and dispatch failures (backend/app/api/websocket.py), but this parser now drops any frame without role/content in the fallback branch. As a result, users get no visible message for these failures and the chat appears to silently stall. Handle data.type === "error" as a user-visible message before this fallback.
Useful? React with 👍 / 👎.
| Type=simple | ||
| ExecStart=%h/.local/bin/clawith-bridge | ||
| Restart=on-failure |
There was a problem hiding this comment.
Avoid hardcoding ~/.local bridge path in Linux unit
The installer can intentionally skip pip install --user when import clawith_bridge already works (dev mode or prior install), but the generated Linux systemd unit always runs %h/.local/bin/clawith-bridge. In those skip-install environments that path may not exist, so the service fails immediately even though the package is importable. Use a launcher that works regardless of install location (for example python3 -m clawith_bridge) or resolve the actual executable path during install.
Useful? React with 👍 / 👎.
…ut kill Two independent error-path bugs flagged in the PR review, both visible only when something else has already gone wrong, both leaving the system in a wedged state with no user-driven recovery: 1. bridge-installer: defer API key rotation until render_installer() succeeds. The Windows exe ships bundled in the backend image; if an operator forgets to build it, render_installer raises FileNotFoundError and we return 503. But we used to commit the rotated key BEFORE building, so that 503 left the caller's already-running bridge with a dead token and no installer — the only recovery was a manual admin key reset. Now we generate the candidate key, build the payload, and only then commit the rotation + bridge_mode auto-enable. 2. SubprocessAdapter: on timeout, the finally block did `await proc.wait()` without first terminating the child. If the CLI kept running past timeout_s (e.g. hanging on network I/O), `wait()` blocked forever and the session slot stayed held until session_manager's outer cancel fired — i.e. timeout_s wasn't actually an upper bound. Extracted the terminate→grace→kill sequence into `_terminate_proc()` and call it from both the streaming finally block and cancel(), so timeout_s is now an actual bound regardless of what the child is doing. Note: the bridge-side fix changes code that gets baked into the bundled Windows exe (`backend/app/static/bridge/clawith-bridge.exe`). That exe will need to be rebuilt + dropped back in before the fix reaches end users on Windows; the source fix is what's under review here. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 72e6e522de
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| except Exception as e: | ||
| yield SessionEvent(kind="stderr_chunk", payload={"text": f"daemon start failed: {e}"}) | ||
| return |
There was a problem hiding this comment.
Propagate daemon start failures as session errors
When start_session_request raises (for example Hermes/OpenClaw daemon offline), this branch converts it into a stderr_chunk and returns normally. SessionManager._run treats that as a successful adapter completion and emits session.done (exit code 0), so failed local-session dispatches are reported as success instead of a terminal error, which makes runtime outages hard to detect and retries misleading.
Useful? React with 👍 / 👎.
| if asyncio.get_event_loop().time() - start_t > timeout_s: | ||
| yield SessionEvent(kind="stderr_chunk", payload={"text": f"timeout after {timeout_s}s"}) | ||
| break |
There was a problem hiding this comment.
Cancel daemon jobs when session timeout is hit
On timeout this loop only emits a stderr_chunk and breaks, but it never calls cancel_request(task_id). That leaves the daemon task running in the background even though the bridge session is considered finished, which can continue mutating state after the user-facing session ended and also reports completion through session.done instead of a hard failure path.
Useful? React with 👍 / 👎.
| try: | ||
| # Await the returned future directly — avoids a race with | ||
| # session.done popping the session from bridge.sessions. | ||
| _final_text = await _aio_br.wait_for(_fut, timeout=1800) |
There was a problem hiding this comment.
Handle abort control frames during bridge session waits
In the OpenClaw bridge path the handler blocks on _fut for up to 1800s without a concurrent control-frame reader, so frontend stop actions ({"type":"abort"}) are not processed until the local session already ends. Users cannot cancel long-running bridge executions, unlike the native LLM path that actively listens for abort messages.
Useful? React with 👍 / 👎.
Covers three paths that shipped without regression coverage:
- GET /agents/{id}/bridge-status response shape (native vs openclaw,
connected vs disconnected, optional-field fallbacks)
- PATCH /agents/{id} silent-drop guard for native agents
- POST /agents/ default bridge_adapter resolution (claude_code for
openclaw, NULL for native even if client sends a value)
72e6e52 to
824a88f
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 824a88f463
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| yield SessionEvent(kind=kind, payload=evt.get("payload") or {}) | ||
| if kind in ("done", "finished"): | ||
| return |
There was a problem hiding this comment.
Map OpenClaw terminal events to valid protocol kinds
When the local OpenClaw daemon emits {"kind":"done"} or {"kind":"finished"}, this branch yields that raw kind and only then returns. SessionManager._run wraps every event in SessionEventFrame, but done/finished are not allowed by the protocol EventKind literal, so frame construction raises and the session is reported as session.error instead of completing successfully. This will trigger on exactly the terminal event shapes this adapter is trying to support.
Useful? React with 👍 / 👎.
| async def fail_all(self, reason: str) -> None: | ||
| for sid, running in list(self._sessions.items()): | ||
| running.task.cancel() | ||
| self._sessions.clear() |
There was a problem hiding this comment.
Cancel adapter jobs when bridge disconnects
On connection teardown, BridgeClient._read_loop calls SessionManager.fail_all("bridge disconnected"), but this method only cancels asyncio tasks and clears _sessions. It never calls running.adapter.cancel(...), so daemon-backed jobs (Hermes/OpenClaw) are left running on the local daemon even though the bridge is gone and the server has already abandoned those sessions. This creates orphaned work and can continue mutating state after disconnect.
Useful? React with 👍 / 👎.
| async def _git_diff_summary(snap: _GitSnapshot) -> dict[str, Any]: | ||
| # Include both committed changes (HEAD..HEAD) — N/A here, snap.head == current HEAD | ||
| # and uncommitted changes (working tree + index vs. HEAD). | ||
| rc, out, err = await _run(["git", "diff", "--numstat", snap.head], cwd=snap.cwd) |
There was a problem hiding this comment.
Diff against a real session baseline, not just HEAD
The git diff summary uses only the commit hash captured at session start and runs git diff --numstat <head> at session end. If the repo was already dirty before the session began, all pre-existing changes are counted in every diff_summary, inflating files_changed/insertions/deletions and misattributing edits to the current run. A per-session baseline of worktree/index state is needed to report only changes introduced during this session.
Useful? React with 👍 / 👎.
Downloading an installer no longer rotates the agent's API key. The
plaintext key is now stored in `agents.api_key` alongside the existing
hash; download reuses it, so re-downloading (for a different runtime,
another platform, another machine) keeps any running bridge online.
Legacy agents without plaintext mint + dual-write once on their first
download — a one-time opportunistic upgrade preserved behind the same
rollback-safe pattern as before (no DB mutation until render succeeds).
Explicit rotation stays on `POST /agents/{id}/api-key`, which now
dual-writes both columns so gateway's existing plaintext/hash fallback
stays consistent.
Tests: 5 new cases in `test_install_rotate_decoupling.py` covering
reuse, legacy upgrade, render-failure rollback, explicit rotate
dual-write, and openclaw-create dual-write.
Frontend: removes the "download regenerates the key, bridge will
disconnect" warning from OpenClawSettings; the existing "Regenerate
API Key" section already carries that warning and remains the
explicit rotation UI.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Design-only draft for Phase 2 — no code, schema, or protocol changes.
Motivates the next step after the install/rotate decouple: collapsing
the 1:1 bridge-per-agent fanout so one bridge process on a machine can
serve every agent the owner runs there. Covers data model (`bridges`
table + `agents.bridge_id`), protocol extension (`bridge.register` with
`bridge_id`), auth path, session dispatcher, installer flow
("Add a machine" vs per-agent), legacy migration, rollout (feature
flag `ENABLE_PER_MACHINE_BRIDGE`), and open questions left for the
implementer.
Placed at repo root next to ARCHITECTURE_SPEC_EN.md / AGENTS.md since
`docs/` is .gitignored.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: e9431ae0a9
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| "type": "done", | ||
| "role": "assistant", | ||
| "content": _persist_text, | ||
| } |
There was a problem hiding this comment.
Send fallback text when bridge session ends with error
When the OpenClaw bridge path fails, this done payload sends an empty content and puts the real failure only in session_error. Clients that only render done.content (for example the Agent Detail chat websocket handler) will show a blank assistant reply, so timeout/disconnect/runtime failures become invisible to users even though the backend has an error string. Populate content with a readable fallback when _session_err is set (or also emit a standard error frame) to keep existing clients informative.
Useful? React with 👍 / 👎.
AgentCreate flow now shows Platform Hosted vs On My Machine at Tier 1, and reveals Claude Code / OpenClaw / Hermes as Tier 2 only after On My Machine is selected. The prior flat 4-card layout conflated "where the agent runs" with "which local runtime to use". The initial bridgeAdapter state is now empty, and submit is guarded with a clearer "pick a local agent first" error.
The prior OpenClaw adapter was a placeholder that assumed an HTTP/SSE daemon at :9000 — OpenClaw's actual runtime is a WebSocket JSON-RPC gateway at :18789 with Ed25519 device-challenge auth. Rather than re-implementing all of that, we now spawn `openclaw acp`, the CLI subcommand that wraps the gateway and speaks Agent Client Protocol (JSON-RPC 2.0) over stdio. OpenClaw handles the auth/framing internally; we just route messages. - new: ACPSubprocessAdapter base in adapters/acp_base.py — JSON-RPC read/write loop, request/response correlation, session/update notification mapping to Clawith EventKind, agent-initiated request handling (auto-approves session/request_permission so prompts can proceed) - rewrite: OpenClawAdapter now extends ACPSubprocessAdapter with a thin build_acp_argv that threads config.extra (url, token_file, session_label, provenance, verbose) into `openclaw acp` flags - fix: DaemonAdapter.start_session now re-raises after emitting the diagnostic stderr_chunk. Before, a daemon connection failure silently yielded and returned — the session manager interpreted it as a clean completion and sent SessionDoneFrame(exit=0, final=""), presenting as a green status and an empty reply. Re-raising surfaces session.error instead. (Hermes still uses DaemonAdapter — this fix turns its current silent-empty-reply UX into a clear error until that adapter is also rewritten.) - Windows .cmd shim handling generalized in acp_base.resolve_stdio_executable, parallel to claude_code.py's resolver - 25 tests covering notification mapping, executable resolution, end-to-end subprocess round-trip with a fake ACP agent, and daemon-failure regression Hermes is intentionally left on its placeholder this round — can't be E2E verified without a local Hermes install. Will follow in a separate PR.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 9cd7b74989
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| raw_key = f"oc-{secrets.token_urlsafe(32)}" | ||
| agent.api_key = raw_key | ||
| agent.api_key_hash = hashlib.sha256(raw_key.encode()).hexdigest() | ||
| await db.commit() |
There was a problem hiding this comment.
Revoke active bridge sessions when rotating API keys
Updating api_key_hash here does not invalidate an already-connected bridge: /ws/bridge authenticates only once at connect time, and the in-memory dispatcher keeps serving that socket by agent_id afterward. In practice, rotating a compromised key still leaves the existing bridge fully authorized to run sessions until it disconnects, which defeats the expected revocation semantics of this endpoint.
Useful? React with 👍 / 👎.
| agent.api_key = raw_key | ||
| agent.api_key_hash = hashlib.sha256(raw_key.encode()).hexdigest() | ||
| agent.status = "idle" | ||
| agent.bridge_mode = "enabled" |
There was a problem hiding this comment.
Default new OpenClaw agents to fallback-capable bridge mode
New OpenClaw agents are forced to bridge_mode="enabled" at creation, but the chat websocket path rejects requests when no bridge is connected instead of queueing to gateway fallback (the fallback path is only used for auto/disabled). This makes fresh agents fail chat out of the box until operators manually install/connect a bridge, even though creation still returns API-key-based onboarding.
Useful? React with 👍 / 👎.
`/ws/bridge` authenticates only at the WebSocket upgrade — once a bridge is connected the read loop never re-checks the token. Rotating the hash alone left the old socket running sessions on a revoked credential until it disconnected on its own. Codex P1 flagged this. Fix: - `SessionDispatcher.detach_bridge` gains `close_ws` + `reason` params. When set, the server closes the WS with code 4001 (same class as upgrade-time auth failure) before failing in-flight sessions. Existing call site in `/ws/bridge`'s read-loop finally keeps the no-close default, so no behavior change there. - Rotate endpoint calls `detach_bridge(close_ws=True, reason="api_key_rotated")` after committing the new hash. Wrapped in try/except so an eviction failure does not mask the successful rotation from the operator. Tests pin both the dispatcher contract (close code, reason, in-flight session failure, no-op when unattached) and the integration (rotate endpoint invokes detach_bridge with the right args, still succeeds if eviction raises). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Clawith/backend/app/api/gateway.py
Line 39 in e765f9f
_get_agent_by_key’s “plaintext” branch compares the caller token to Agent.api_key_hash instead of Agent.api_key, which means a stored SHA-256 hash is accepted as a valid credential. In this codebase api_key_hash is returned in agent payloads (AgentOut), so any principal that can read an agent record can replay that hash to /gateway/* or /ws/bridge and fully impersonate the agent without the real key.
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| await websocket.send_json(msg) | ||
| except Exception: |
There was a problem hiding this comment.
Cancel bridge session when chat websocket send fails
When the client socket drops mid-turn, _bridge_drain swallows the send_json failure and returns, but the outer handler still waits on _fut for up to 1800s and never cancels the dispatched local session. In that disconnect scenario, bridge work continues with no recipient, consuming bridge capacity and potentially applying side effects long after the user has gone away.
Useful? React with 👍 / 👎.
…ops crashing startup When the host-side ./ss-nodes.json is absent, Docker auto-creates the bind-mount source as a directory. The previous os.path.exists() check passed, then open() raised IsADirectoryError — surfaced as "Background task ss-local-proxy CRASHED" on every container start. isfile() treats the directory as absent and takes the clean skip branch.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 618ec2176c
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if remaining <= 0: | ||
| yield SessionEvent(kind="stderr_chunk", payload={ | ||
| "text": f"timeout after {timeout_s}s", | ||
| }) | ||
| break |
There was a problem hiding this comment.
Raise ACP timeout as terminal session failure
When timeout_s is exceeded, this branch only emits a stderr_chunk and breaks, so start_session returns normally; SessionManager._run then reports session.done with success semantics instead of session.error. In long-running OpenClaw ACP sessions, a true timeout is therefore misreported as a successful completion, which hides failures and can produce misleading downstream chat state.
Useful? React with 👍 / 👎.
| existing_plaintext = getattr(agent, "api_key", None) | ||
| if existing_plaintext: | ||
| raw_key = existing_plaintext | ||
| needs_persist = False | ||
| else: | ||
| raw_key = f"oc-{secrets.token_urlsafe(32)}" | ||
| needs_persist = True |
There was a problem hiding this comment.
Serialize legacy installer key minting per agent
This legacy-upgrade path mints a new key whenever api_key is null, but it does so before any lock/recheck; two concurrent POST /agents/{id}/bridge-installer calls can each bake different keys, and the later commit overwrites api_key/api_key_hash, invalidating the earlier downloaded installer immediately. This can happen with parallel retries/tabs/admin actions and causes flaky first-time bridge setup for pre-migration agents.
Useful? React with 👍 / 👎.
Summary
Reverse-WebSocket bridge that lets users run local agents (Claude Code, OpenClaw daemon, Hermes) from their own machine while keeping the Clawith platform as the control plane.
claude_code(default) |openclaw|hermes. Stored asagents.bridge_adapter; server strictly matches against what the bridge advertises and fails with a clear error instead of silent fallback. Editable after creation (prompts re-download if the installed bridge doesn't advertise the new choice)./agents/{id}/bridge-statusand show an explicit "bridge connected but doesn't advertise X" warning before the user tries to chat. Three-state dot on AgentDetail (green = online+match, amber = online+mismatch, red = offline).CLWB!END, JSON blob, big-endian uint32 pristine_len) onto the pristineclawith-bridge.exe. On first double-click the bridge reads the trailer, copies itself to%LOCALAPPDATA%\Clawith\bin\, writes~/.clawith-bridge.tomlwith only the chosen adapter enabled, registers a user-scope scheduled task, then strips the trailer from the installed copy so the scheduler never re-runs the install flow.clawith_bridgeand registers launchd/systemd user service.Install / rotate decoupling (latest push)
Installer download no longer rotates the agent's API key. The plaintext key is now stored in
agents.api_keyalongside the existing hash; re-downloading an installer (for a different runtime, another platform, another machine) reuses it so any running bridge stays online. Explicit rotation is the pre-existingPOST /agents/{id}/api-keyendpoint, which now dual-writes both columns.add_agent_api_keyaddsagents.api_key VARCHAR(128)nullable. No backfill — legacy agents upgrade opportunistically on their first post-migration download (mint + dual-write once).download_bridge_installerreuses the stored plaintext when present and falls back to the old mint-and-persist path otherwise. The rollback-safe "build first, persist later" ordering from the earlier Codex fix is preserved.create_agent(openclaw branch) now dual-writes on creation so the first download already takes the reuse path.gateway._get_agent_by_key's existing plaintext/hash dual-path auth is unchanged — the new plaintext column hits the fast path.test_install_rotate_decoupling.py— 5 cases covering reuse, legacy upgrade, render-failure rollback, explicit rotate dual-write, openclaw-create dual-write.Next step — per-machine bridge (design only)
Phase 1 above fixes the download side of the UX problem but leaves the 1:1 bridge-per-agent installation fanout untouched.
PER_MACHINE_BRIDGE_DESIGN.mdin this PR drafts the follow-up: collapse N bridges on a machine down to one, with agents bound to bridges via a newagents.bridge_idcolumn. It's a design-only doc — no schema, protocol, or code changes. See the doc for data model, auth path, dispatcher, rollout flag, and open questions.Security hardening (Codex review)
PUBLIC_BASE_URLis now required for installer issuance — no moreHost/X-Forwarded-Hostfallback (was a reflection-injection vector)./api/admin/bridge/statusendpoint removed (never shipped with auth).render_installer()succeeds — if the bundled bridge exe is missing (503), the existing bridge's key keeps working instead of being silently invalidated. (Subsumed by the install/rotate decouple above — download no longer rotates at all in the common path.)SubprocessAdapternow emits a boundedterminate → grace → killcycle on timeout. Previouslyproc.wait()could block past the declared timeout.AgentCreate.bridge_adapternow regex-validated (same pattern asAgentUpdate).What's in the diff
backend/app/api/bridge_ws.py— per-agent/ws/bridgeendpointbackend/app/api/agents.py—/agents/{id}/bridge-status, required-PUBLIC_BASE_URL installer path, bridge_adapter guard inupdate_agent, install/rotate decouple (dual-storage reuse + legacy upgrade)backend/app/models/agent.py— newapi_keyplaintext column alongsideapi_key_hashbackend/app/services/local_agent/— adapter registry, protocol frames, session dispatcher, installer templatesbridge/— standaloneclawith_bridgePython package + PyInstaller spec (three adapters, reverse-WS session manager, trailer reader, Windows installer,_terminate_procsubprocess helper)add_bridge_mode,add_bridge_adapter(backfills existing openclaw agents toclaude_code), twoALTER TYPE ... ADD VALUE IF NOT EXISTSmigrations for the activity-log enum, andadd_agent_api_key(plaintext column, no backfill)test_bridge_installer_template.py,test_agent_schema_bridge_adapter.py,test_bridge_migrations_idempotent.py,test_bridge_adapter_endpoints.py,test_install_rotate_decoupling.py— 60 passingPER_MACHINE_BRIDGE_DESIGN.md(no code)backend/app/static/bridge/clawith-bridge.exeis gitignored and rebuilt frombridge/viapyinstaller clawith-bridge.spec, then copied intobackend/app/static/bridge/before shipping.Test plan
alembic upgrade headapplies migrations idempotently; existingagent_type='openclaw'rows getbridge_adapter='claude_code';agent_type='native'rows keepNULL; newapi_keycolumn is NULL on pre-migration rowsPlatform Hostedcard → unchanged behavior (native LLM path)Claude Codecard → downloadclawith-bridge-setup.exe→ double-click →~/.clawith-bridge.tomlhas only[claude_code] enabled = true; scheduled taskClawithBridgeexists; bridge log showsadapters=['claude_code']; chat messages route through Claude Code CLIOpenClawcard → TOML has[openclaw] enabled = true, base_url = "http://127.0.0.1:9000"; local daemon at:9000handles messagesHermescard → TOML has[hermes] enabled = true, base_url = "http://127.0.0.1:7890"bridge_adapteron an existing openclaw agent → UI prompts re-download; serversession.startnow uses the new adapterPUBLIC_BASE_URL→/installerdownload returns 500 (no silent Host-header fallback)agents.api_key_hashunchanged between downloadsbash install-clawith-bridge.shon Linux registersclawith-bridge.serviceunder systemd --user🤖 Generated with Claude Code