Skip to content

feat: local-agent bridge with pluggable runtime adapters#452

Open
TatsuKo-Tsukimi wants to merge 15 commits intodataelement:mainfrom
TatsuKo-Tsukimi:feat/local-agent-bridge
Open

feat: local-agent bridge with pluggable runtime adapters#452
TatsuKo-Tsukimi wants to merge 15 commits intodataelement:mainfrom
TatsuKo-Tsukimi:feat/local-agent-bridge

Conversation

@TatsuKo-Tsukimi
Copy link
Copy Markdown

@TatsuKo-Tsukimi TatsuKo-Tsukimi commented Apr 22, 2026

Summary

Reverse-WebSocket bridge that lets users run local agents (Claude Code, OpenClaw daemon, Hermes) from their own machine while keeping the Clawith platform as the control plane.

  • Per-agent runtime selection: claude_code (default) | openclaw | hermes. Stored as agents.bridge_adapter; server strictly matches against what the bridge advertises and fails with a clear error instead of silent fallback. Editable after creation (prompts re-download if the installed bridge doesn't advertise the new choice).
  • Live mismatch surfacing: AgentDetail / OpenClawSettings poll /agents/{id}/bridge-status and show an explicit "bridge connected but doesn't advertise X" warning before the user tries to chat. Three-state dot on AgentDetail (green = online+match, amber = online+mismatch, red = offline).
  • One-click Windows installer: server bakes a config trailer (magic CLWB!END, JSON blob, big-endian uint32 pristine_len) onto the pristine clawith-bridge.exe. On first double-click the bridge reads the trailer, copies itself to %LOCALAPPDATA%\Clawith\bin\, writes ~/.clawith-bridge.toml with only the chosen adapter enabled, registers a user-scope scheduled task, then strips the trailer from the installed copy so the scheduler never re-runs the install flow.
  • Unix installer: bash script that pip-installs clawith_bridge and registers launchd/systemd user service.
  • Frontend: 4 flat runtime cards in AgentCreate (Platform Hosted / Claude Code / OpenClaw / Hermes); runtime badge + bridge dot on AgentDetail; runtime selector with live mismatch banner above the installer download in OpenClawSettings; en/zh i18n.

Install / rotate decoupling (latest push)

Installer download no longer rotates the agent's API key. The plaintext key is now stored in agents.api_key alongside the existing hash; re-downloading an installer (for a different runtime, another platform, another machine) reuses it so any running bridge stays online. Explicit rotation is the pre-existing POST /agents/{id}/api-key endpoint, which now dual-writes both columns.

  • New migration add_agent_api_key adds agents.api_key VARCHAR(128) nullable. No backfill — legacy agents upgrade opportunistically on their first post-migration download (mint + dual-write once).
  • download_bridge_installer reuses the stored plaintext when present and falls back to the old mint-and-persist path otherwise. The rollback-safe "build first, persist later" ordering from the earlier Codex fix is preserved.
  • create_agent (openclaw branch) now dual-writes on creation so the first download already takes the reuse path.
  • gateway._get_agent_by_key's existing plaintext/hash dual-path auth is unchanged — the new plaintext column hits the fast path.
  • Frontend: removed the "download regenerates the key, bridge will disconnect" warning from OpenClawSettings and clarified that explicit rotation is the separate "Regenerate API Key" button.
  • Tests (new): test_install_rotate_decoupling.py — 5 cases covering reuse, legacy upgrade, render-failure rollback, explicit rotate dual-write, openclaw-create dual-write.

Next step — per-machine bridge (design only)

Phase 1 above fixes the download side of the UX problem but leaves the 1:1 bridge-per-agent installation fanout untouched. PER_MACHINE_BRIDGE_DESIGN.md in this PR drafts the follow-up: collapse N bridges on a machine down to one, with agents bound to bridges via a new agents.bridge_id column. It's a design-only doc — no schema, protocol, or code changes. See the doc for data model, auth path, dispatcher, rollout flag, and open questions.

Security hardening (Codex review)

  • PUBLIC_BASE_URL is now required for installer issuance — no more Host / X-Forwarded-Host fallback (was a reflection-injection vector).
  • Unauthed /api/admin/bridge/status endpoint removed (never shipped with auth).
  • API key rotation on installer redownload is now deferred until after render_installer() succeeds — if the bundled bridge exe is missing (503), the existing bridge's key keeps working instead of being silently invalidated. (Subsumed by the install/rotate decouple above — download no longer rotates at all in the common path.)
  • SubprocessAdapter now emits a bounded terminate → grace → kill cycle on timeout. Previously proc.wait() could block past the declared timeout.
  • AgentCreate.bridge_adapter now regex-validated (same pattern as AgentUpdate).

What's in the diff

  • backend/app/api/bridge_ws.py — per-agent /ws/bridge endpoint
  • backend/app/api/agents.py/agents/{id}/bridge-status, required-PUBLIC_BASE_URL installer path, bridge_adapter guard in update_agent, install/rotate decouple (dual-storage reuse + legacy upgrade)
  • backend/app/models/agent.py — new api_key plaintext column alongside api_key_hash
  • backend/app/services/local_agent/ — adapter registry, protocol frames, session dispatcher, installer templates
  • bridge/ — standalone clawith_bridge Python package + PyInstaller spec (three adapters, reverse-WS session manager, trailer reader, Windows installer, _terminate_proc subprocess helper)
  • Migrations: add_bridge_mode, add_bridge_adapter (backfills existing openclaw agents to claude_code), two ALTER TYPE ... ADD VALUE IF NOT EXISTS migrations for the activity-log enum, and add_agent_api_key (plaintext column, no backfill)
  • Frontend: AgentCreate / AgentDetail / Chat / OpenClawSettings / i18n
  • Tests (new): test_bridge_installer_template.py, test_agent_schema_bridge_adapter.py, test_bridge_migrations_idempotent.py, test_bridge_adapter_endpoints.py, test_install_rotate_decoupling.py — 60 passing
  • Design: PER_MACHINE_BRIDGE_DESIGN.md (no code)

backend/app/static/bridge/clawith-bridge.exe is gitignored and rebuilt from bridge/ via pyinstaller clawith-bridge.spec, then copied into backend/app/static/bridge/ before shipping.

Test plan

  • alembic upgrade head applies migrations idempotently; existing agent_type='openclaw' rows get bridge_adapter='claude_code'; agent_type='native' rows keep NULL; new api_key column is NULL on pre-migration rows
  • Create agent with Platform Hosted card → unchanged behavior (native LLM path)
  • Create agent with Claude Code card → download clawith-bridge-setup.exe → double-click → ~/.clawith-bridge.toml has only [claude_code] enabled = true; scheduled task ClawithBridge exists; bridge log shows adapters=['claude_code']; chat messages route through Claude Code CLI
  • Same flow with OpenClaw card → TOML has [openclaw] enabled = true, base_url = "http://127.0.0.1:9000"; local daemon at :9000 handles messages
  • Same flow with Hermes card → TOML has [hermes] enabled = true, base_url = "http://127.0.0.1:7890"
  • Edit bridge_adapter on an existing openclaw agent → UI prompts re-download; server session.start now uses the new adapter
  • Switch runtime to something the connected bridge doesn't advertise → OpenClawSettings shows red mismatch banner before chat; AgentDetail dot turns amber
  • Stop local bridge → AgentDetail dot turns red within one poll cycle (~5s)
  • Unset PUBLIC_BASE_URL/installer download returns 500 (no silent Host-header fallback)
  • Trigger installer build failure (e.g. delete bundled exe) → API key remains valid, bridge stays connected
  • Re-download installer twice with a connected bridge → bridge stays online across both downloads; agents.api_key_hash unchanged between downloads
  • Legacy openclaw agent (api_key=NULL) → first download after migration mints + dual-writes; subsequent download reuses
  • Click "Regenerate API Key" explicitly → bridge disconnects (old key revoked); new download reuses the new key
  • Unix installer: bash install-clawith-bridge.sh on Linux registers clawith-bridge.service under systemd --user

🤖 Generated with Claude Code

TatsuKo-Tsukimi and others added 2 commits April 22, 2026 11:49
Adds a reverse-WebSocket bridge that lets users run local agents
(Claude Code, OpenClaw daemon, Hermes) from their own machine while
keeping the Clawith platform as the control plane.

Server
- /ws/bridge endpoint (backend/app/api/bridge_ws.py): per-agent WS
  channel; bridge advertises available adapters, server dispatches
  sessions via SessionStartFrame.adapter.
- local_agent/ service package: adapter registry, protocol frames,
  session dispatcher.
- Installer endpoint on /agents/{id}/bridge-installer: renders a
  platform-specific installer with the agent's API key baked in.
  Windows returns a single pristine clawith-bridge.exe + JSON config
  trailer (magic CLWB!END); Unix returns a bash script that installs
  the pip package and registers launchd/systemd.
- agent_type='openclaw' now carries a bridge_adapter field
  (claude_code | openclaw | hermes) chosen at creation time. Server
  strictly matches the bridge's advertised adapters and fails with
  a clear error otherwise, instead of silently falling back.

Bridge package (bridge/)
- clawith_bridge: Python package with three adapters and a reverse-WS
  session manager.
- PyInstaller spec that produces a single clawith-bridge.exe.
- baked_config trailer reader: on first launch from the
  setup.exe, extracts server/token/adapter, runs the install flow,
  strips the trailer from the installed copy, and registers a user
  scheduled task.
- install_windows.py: copies the exe to %LOCALAPPDATA%/Clawith/bin/,
  writes ~/.clawith-bridge.toml with only the chosen adapter enabled,
  registers a user-scope scheduled task that auto-starts at logon.

Frontend
- AgentCreate: 4 flat runtime cards (Platform Hosted / Claude Code /
  OpenClaw / Hermes). Non-native cards set agent_type='openclaw' with
  different bridge_adapter values.
- AgentDetail: badge shows "Bridge . <Runtime> . Lab".
- OpenClawSettings: bridge installer download block with a readonly
  Runtime line so users can confirm which adapter their installer
  will enable.
- i18n keys for wizard.runtime.* and wizard.bridge.* in en/zh.

Migrations
- add_bridge_mode: opt-in flag for bridge-style agents.
- add_bridge_adapter: new column, backfills existing openclaw agents
  to claude_code (matches current de-facto behavior).

Build
- bridge/clawith-bridge.exe is ignored and rebuilt via
  pyinstaller clawith-bridge.spec, then copied into
  backend/app/static/bridge/ before shipping.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
session_dispatcher.py logs these two events when a bridge connects
or disconnects, but the enum never included them — so every
attach/detach raised InvalidTextRepresentationError in logs and the
activity row was dropped.

Adds both values via idempotent ALTER TYPE ... ADD VALUE IF NOT
EXISTS, matching the pattern used in add_agentbay_enum_value.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: dc91ed8e39

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread backend/app/api/agents.py Outdated
if getattr(agent, "bridge_mode", "disabled") == "disabled":
agent.bridge_mode = "enabled"

await db.commit()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Defer API key rotation until installer payload is built

Committing the regenerated key before calling render_installer means a failed download (for example, Windows binary missing and render_installer raising FileNotFoundError) still invalidates the currently running bridge token. In that failure path the API returns 503, but the old key is already revoked, so operators lose connectivity without receiving a usable installer. Move the commit after successful payload generation (or roll back on generation errors) so failed downloads are side-effect free.

Useful? React with 👍 / 👎.

Comment thread backend/app/api/agents.py
Comment on lines +244 to +247
bridge_adapter=(
(data.bridge_adapter or "claude_code")
if (data.agent_type or "native") == "openclaw"
else None
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Validate bridge adapter before persisting OpenClaw agents

The create flow persists data.bridge_adapter directly for OpenClaw agents without constraining it to supported adapters. A client can store any string, which later causes chat dispatch to fail every turn with "Selected runtime ... is not available" while installer generation quietly falls back to claude_code, creating a persistent runtime mismatch for that agent. Add server-side validation/enumeration at creation time to reject unknown adapter values.

Useful? React with 👍 / 👎.

TatsuKo-Tsukimi and others added 3 commits April 22, 2026 12:02
…m gaps

Adds a live bridge-connection indicator so users can tell at a glance
whether their local bridge is running, instead of discovering offline
state by failing to chat.

Backend
- GET /agents/{id}/bridge-status: reuses check_agent_access, returns
  {connected, applicable, bridge_version, adapters, connected_at,
  active_sessions}. Non-bridge agents return applicable=false.

Frontend
- agentApi.bridgeStatus() client method.
- AgentDetail polls every 5s (only for agent_type=openclaw, only
  while page is foregrounded) and renders a colored dot inside the
  existing runtime badge — green when connected, red when offline,
  transparent while the first request is in flight.
- Tooltip shows bridge version + active session count when connected,
  and a hint to install/start the bridge when offline.

Enum sweep (fix)
- Also adds local_session_start / local_session_done /
  local_session_error / reverse_tool_call / reverse_tool_result /
  bridge_installer_download to activity_action_enum. These were all
  referenced from bridge code but missing from the enum, so every
  bridge session was silently losing its audit trail.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
After creating an openclaw agent the runtime was frozen: even though the
server/bridge supports claude_code/openclaw/hermes at runtime, the only
way to change adapter was to delete the agent and re-create it. This
change exposes runtime selection on the agent's OpenClaw Settings page.

- schemas.AgentUpdate: add bridge_adapter: str | None with regex guard
- agents.update_agent: silently drop bridge_adapter for non-openclaw
  agents instead of 422'ing (idempotent if UI sends it for native)
- OpenClawSettings.tsx: replace readonly runtime badge with a 3-button
  selector; on change, PATCH /agents/{id}, invalidate cache, and show a
  yellow notice telling the user to redownload the installer OR edit
  ~/.clawith-bridge.toml to set enabled=true under [<new_adapter>] and
  restart the bridge. The server side of the handshake picks up the new
  adapter on next SessionStart regardless of which path the user takes.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…fail

When the installed bridge advertises an adapter different from what the
agent is configured for, server-side dispatch rejects with a chat-time
error. That's correct but late — users only discover the problem by
failing to chat, and the remediation (redownload or edit TOML) lives in
a chat bubble they likely dismissed.

Two changes, both driven by /agents/{id}/bridge-status polling:

- OpenClawSettings: RuntimeSelector now accepts the live bridge_status
  and renders a red banner whenever bridge.adapters doesn't include the
  selected runtime. The banner lists what the bridge actually advertises,
  points at the installer below, and shows the TOML escape hatch. The
  previous "justChanged" yellow banner is kept but only fires when the
  bridge is offline at the moment of switching (no live signal to rely on).

- AgentDetail bridge badge: three-state dot. Green = online + match,
  yellow = online + mismatch (with amber glow), red = offline. Tooltip
  spells out the mismatch and points at Settings. The prior two-state
  (green/red) would silently light green even when the agent's adapter
  was unreachable.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 64cb73d2b6

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread bridge/clawith_bridge/adapters/base.py Outdated
r.cancel()
watcher.cancel()
try:
await proc.wait()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Kill timed-out subprocess before waiting for exit

When SubprocessAdapter.start_session hits its timeout path, it breaks out of the event loop but then unconditionally waits on proc.wait() without terminating the child first. If the underlying CLI keeps running after output stalls, the bridge session can hang instead of completing, consuming a session slot and delaying session.error/done until a separate outer timeout cancels it. Ensure the process is terminated (or cancel() is invoked) before this wait so timeout_s actually bounds execution.

Useful? React with 👍 / 👎.

Comment thread backend/app/api/bridge_ws.py Outdated
Comment on lines +195 to +196
@router.get("/api/admin/bridge/status")
async def bridge_status():
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Enforce admin auth on bridge status endpoint

This route is exposed as /api/admin/bridge/status but has no authentication/authorization dependency, so unauthenticated callers can list connected bridge metadata (agent IDs, adapters, versions, active sessions). That leaks internal runtime topology and agent identifiers from what is clearly intended as an admin surface; it should be protected by the existing admin auth path before being publicly routable.

Useful? React with 👍 / 👎.

TatsuKo-Tsukimi and others added 2 commits April 22, 2026 13:01
Two issues from external review on the bridge PR:

- Remove GET /api/admin/bridge/status from bridge_ws.py. It had no auth
  enforcement and exposed every connected bridge's agent_id, version,
  and active sessions to anyone who could reach the backend. The
  comment admitted "V1 doesn't do auth, add later" — 'later' was never
  going to be safer than now. The per-agent /agents/{id}/bridge-status
  endpoint added with the bridge status badge already covers the
  legitimate use case and is auth-gated.

- Require PUBLIC_BASE_URL for bridge-installer downloads instead of
  falling back to the request's Host / X-Forwarded-Host header. An
  authenticated attacker could set those to an attacker-controlled
  hostname and have the installer bake that into the bridge's
  dial-home URL. Validation now runs before the DB mutation, so a
  misconfiguration doesn't rotate the API key and leave the agent
  wedged. .env.example calls out the requirement explicitly.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Three focused test files covering the code paths most likely to silently
regress the bridge feature:

- test_bridge_installer_template.py: render_installer(adapter=...) with
  each of claude_code/openclaw/hermes produces a TOML section block
  with exactly that adapter's `enabled = true` and the others `false`;
  unknown adapters fall back to claude_code; agent_name with CR/LF is
  flattened so it can't escape the `# Agent:` comment line.
- test_agent_schema_bridge_adapter.py: AgentCreate/AgentUpdate reject
  typos, wrong case, wrong separators, and injection-style values.
- test_bridge_migrations_idempotent.py: loads each enum migration by
  file path (alembic/versions isn't a package), mocks op.execute, and
  asserts every emitted statement carries ADD VALUE IF NOT EXISTS —
  the clause that lets the backend boot re-run migrations safely on
  every container restart. Also cross-checks that every bridge action
  type logged by the session code is present in the enum.

Also adds the same regex constraint to AgentCreate.bridge_adapter that
was already on AgentUpdate.bridge_adapter — previously Create accepted
arbitrary strings, so a typo could end up in the DB and only surface at
chat-time when the dispatcher couldn't find a matching adapter.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: dfe781f76d

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +144 to +146
except FileNotFoundError as e:
yield SessionEvent(kind="stderr_chunk", payload={"text": f"{argv[0]!r} not found: {e}"})
return
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Surface missing adapter executable as session error

When a local runtime binary is missing (for example, claude not installed), this branch emits a stderr_chunk and returns normally, which makes SessionManager._run treat the run as successful and send session.done with exit_code 0 instead of session.error. In practice, bridge startup misconfiguration is reported as a blank/"successful" turn, so operators get no reliable failure signal. This path should raise or otherwise propagate a terminal error state so the session is marked failed.

Useful? React with 👍 / 👎.

Comment on lines +773 to +777
} else if (data.role && data.content) {
// Legacy format: {role, content}
setMessages(prev => [...prev, { role: data.role, content: data.content }]);
} else {
// Unknown event with no role/content — e.g. Anthropic SDK
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Render websocket error frames instead of dropping them

The new bridge flow sends explicit {"type":"error","content":...} frames for runtime mismatch, offline bridge, and dispatch failures (backend/app/api/websocket.py), but this parser now drops any frame without role/content in the fallback branch. As a result, users get no visible message for these failures and the chat appears to silently stall. Handle data.type === "error" as a user-visible message before this fallback.

Useful? React with 👍 / 👎.

Comment on lines +139 to +141
Type=simple
ExecStart=%h/.local/bin/clawith-bridge
Restart=on-failure
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Avoid hardcoding ~/.local bridge path in Linux unit

The installer can intentionally skip pip install --user when import clawith_bridge already works (dev mode or prior install), but the generated Linux systemd unit always runs %h/.local/bin/clawith-bridge. In those skip-install environments that path may not exist, so the service fails immediately even though the package is importable. Use a launcher that works regardless of install location (for example python3 -m clawith_bridge) or resolve the actual executable path during install.

Useful? React with 👍 / 👎.

…ut kill

Two independent error-path bugs flagged in the PR review, both visible
only when something else has already gone wrong, both leaving the
system in a wedged state with no user-driven recovery:

1. bridge-installer: defer API key rotation until render_installer()
   succeeds. The Windows exe ships bundled in the backend image; if an
   operator forgets to build it, render_installer raises FileNotFoundError
   and we return 503. But we used to commit the rotated key BEFORE
   building, so that 503 left the caller's already-running bridge with
   a dead token and no installer — the only recovery was a manual admin
   key reset. Now we generate the candidate key, build the payload, and
   only then commit the rotation + bridge_mode auto-enable.

2. SubprocessAdapter: on timeout, the finally block did `await proc.wait()`
   without first terminating the child. If the CLI kept running past
   timeout_s (e.g. hanging on network I/O), `wait()` blocked forever and
   the session slot stayed held until session_manager's outer cancel
   fired — i.e. timeout_s wasn't actually an upper bound. Extracted the
   terminate→grace→kill sequence into `_terminate_proc()` and call it
   from both the streaming finally block and cancel(), so timeout_s
   is now an actual bound regardless of what the child is doing.

Note: the bridge-side fix changes code that gets baked into the bundled
Windows exe (`backend/app/static/bridge/clawith-bridge.exe`). That exe
will need to be rebuilt + dropped back in before the fix reaches end
users on Windows; the source fix is what's under review here.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 72e6e522de

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread bridge/clawith_bridge/adapters/base.py Outdated
Comment on lines +323 to +325
except Exception as e:
yield SessionEvent(kind="stderr_chunk", payload={"text": f"daemon start failed: {e}"})
return
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Propagate daemon start failures as session errors

When start_session_request raises (for example Hermes/OpenClaw daemon offline), this branch converts it into a stderr_chunk and returns normally. SessionManager._run treats that as a successful adapter completion and emits session.done (exit code 0), so failed local-session dispatches are reported as success instead of a terminal error, which makes runtime outages hard to detect and retries misleading.

Useful? React with 👍 / 👎.

Comment on lines +331 to +333
if asyncio.get_event_loop().time() - start_t > timeout_s:
yield SessionEvent(kind="stderr_chunk", payload={"text": f"timeout after {timeout_s}s"})
break
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Cancel daemon jobs when session timeout is hit

On timeout this loop only emits a stderr_chunk and breaks, but it never calls cancel_request(task_id). That leaves the daemon task running in the background even though the bridge session is considered finished, which can continue mutating state after the user-facing session ended and also reports completion through session.done instead of a hard failure path.

Useful? React with 👍 / 👎.

try:
# Await the returned future directly — avoids a race with
# session.done popping the session from bridge.sessions.
_final_text = await _aio_br.wait_for(_fut, timeout=1800)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Handle abort control frames during bridge session waits

In the OpenClaw bridge path the handler blocks on _fut for up to 1800s without a concurrent control-frame reader, so frontend stop actions ({"type":"abort"}) are not processed until the local session already ends. Users cannot cancel long-running bridge executions, unlike the native LLM path that actively listens for abort messages.

Useful? React with 👍 / 👎.

Covers three paths that shipped without regression coverage:
- GET /agents/{id}/bridge-status response shape (native vs openclaw,
  connected vs disconnected, optional-field fallbacks)
- PATCH /agents/{id} silent-drop guard for native agents
- POST /agents/ default bridge_adapter resolution (claude_code for
  openclaw, NULL for native even if client sends a value)
@TatsuKo-Tsukimi TatsuKo-Tsukimi force-pushed the feat/local-agent-bridge branch from 72e6e52 to 824a88f Compare April 22, 2026 06:22
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 824a88f463

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +93 to +95
yield SessionEvent(kind=kind, payload=evt.get("payload") or {})
if kind in ("done", "finished"):
return
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Map OpenClaw terminal events to valid protocol kinds

When the local OpenClaw daemon emits {"kind":"done"} or {"kind":"finished"}, this branch yields that raw kind and only then returns. SessionManager._run wraps every event in SessionEventFrame, but done/finished are not allowed by the protocol EventKind literal, so frame construction raises and the session is reported as session.error instead of completing successfully. This will trigger on exactly the terminal event shapes this adapter is trying to support.

Useful? React with 👍 / 👎.

Comment on lines +124 to +127
async def fail_all(self, reason: str) -> None:
for sid, running in list(self._sessions.items()):
running.task.cancel()
self._sessions.clear()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Cancel adapter jobs when bridge disconnects

On connection teardown, BridgeClient._read_loop calls SessionManager.fail_all("bridge disconnected"), but this method only cancels asyncio tasks and clears _sessions. It never calls running.adapter.cancel(...), so daemon-backed jobs (Hermes/OpenClaw) are left running on the local daemon even though the bridge is gone and the server has already abandoned those sessions. This creates orphaned work and can continue mutating state after disconnect.

Useful? React with 👍 / 👎.

async def _git_diff_summary(snap: _GitSnapshot) -> dict[str, Any]:
# Include both committed changes (HEAD..HEAD) — N/A here, snap.head == current HEAD
# and uncommitted changes (working tree + index vs. HEAD).
rc, out, err = await _run(["git", "diff", "--numstat", snap.head], cwd=snap.cwd)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Diff against a real session baseline, not just HEAD

The git diff summary uses only the commit hash captured at session start and runs git diff --numstat <head> at session end. If the repo was already dirty before the session began, all pre-existing changes are counted in every diff_summary, inflating files_changed/insertions/deletions and misattributing edits to the current run. A per-session baseline of worktree/index state is needed to report only changes introduced during this session.

Useful? React with 👍 / 👎.

TatsuKo-Tsukimi and others added 2 commits April 22, 2026 15:07
Downloading an installer no longer rotates the agent's API key. The
plaintext key is now stored in `agents.api_key` alongside the existing
hash; download reuses it, so re-downloading (for a different runtime,
another platform, another machine) keeps any running bridge online.

Legacy agents without plaintext mint + dual-write once on their first
download — a one-time opportunistic upgrade preserved behind the same
rollback-safe pattern as before (no DB mutation until render succeeds).

Explicit rotation stays on `POST /agents/{id}/api-key`, which now
dual-writes both columns so gateway's existing plaintext/hash fallback
stays consistent.

Tests: 5 new cases in `test_install_rotate_decoupling.py` covering
reuse, legacy upgrade, render-failure rollback, explicit rotate
dual-write, and openclaw-create dual-write.

Frontend: removes the "download regenerates the key, bridge will
disconnect" warning from OpenClawSettings; the existing "Regenerate
API Key" section already carries that warning and remains the
explicit rotation UI.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Design-only draft for Phase 2 — no code, schema, or protocol changes.

Motivates the next step after the install/rotate decouple: collapsing
the 1:1 bridge-per-agent fanout so one bridge process on a machine can
serve every agent the owner runs there. Covers data model (`bridges`
table + `agents.bridge_id`), protocol extension (`bridge.register` with
`bridge_id`), auth path, session dispatcher, installer flow
("Add a machine" vs per-agent), legacy migration, rollout (feature
flag `ENABLE_PER_MACHINE_BRIDGE`), and open questions left for the
implementer.

Placed at repo root next to ARCHITECTURE_SPEC_EN.md / AGENTS.md since
`docs/` is .gitignored.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e9431ae0a9

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +538 to +541
"type": "done",
"role": "assistant",
"content": _persist_text,
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Send fallback text when bridge session ends with error

When the OpenClaw bridge path fails, this done payload sends an empty content and puts the real failure only in session_error. Clients that only render done.content (for example the Agent Detail chat websocket handler) will show a blank assistant reply, so timeout/disconnect/runtime failures become invisible to users even though the backend has an error string. Populate content with a readable fallback when _session_err is set (or also emit a standard error frame) to keep existing clients informative.

Useful? React with 👍 / 👎.

AgentCreate flow now shows Platform Hosted vs On My Machine at Tier 1, and
reveals Claude Code / OpenClaw / Hermes as Tier 2 only after On My Machine is
selected. The prior flat 4-card layout conflated "where the agent runs" with
"which local runtime to use". The initial bridgeAdapter state is now empty,
and submit is guarded with a clearer "pick a local agent first" error.
The prior OpenClaw adapter was a placeholder that assumed an HTTP/SSE daemon
at :9000 — OpenClaw's actual runtime is a WebSocket JSON-RPC gateway at
:18789 with Ed25519 device-challenge auth. Rather than re-implementing all of
that, we now spawn `openclaw acp`, the CLI subcommand that wraps the gateway
and speaks Agent Client Protocol (JSON-RPC 2.0) over stdio. OpenClaw handles
the auth/framing internally; we just route messages.

- new: ACPSubprocessAdapter base in adapters/acp_base.py — JSON-RPC read/write
  loop, request/response correlation, session/update notification mapping to
  Clawith EventKind, agent-initiated request handling (auto-approves
  session/request_permission so prompts can proceed)
- rewrite: OpenClawAdapter now extends ACPSubprocessAdapter with a thin
  build_acp_argv that threads config.extra (url, token_file, session_label,
  provenance, verbose) into `openclaw acp` flags
- fix: DaemonAdapter.start_session now re-raises after emitting the
  diagnostic stderr_chunk. Before, a daemon connection failure silently
  yielded and returned — the session manager interpreted it as a clean
  completion and sent SessionDoneFrame(exit=0, final=""), presenting as a
  green status and an empty reply. Re-raising surfaces session.error
  instead. (Hermes still uses DaemonAdapter — this fix turns its current
  silent-empty-reply UX into a clear error until that adapter is also
  rewritten.)
- Windows .cmd shim handling generalized in acp_base.resolve_stdio_executable,
  parallel to claude_code.py's resolver
- 25 tests covering notification mapping, executable resolution, end-to-end
  subprocess round-trip with a fake ACP agent, and daemon-failure regression

Hermes is intentionally left on its placeholder this round — can't be E2E
verified without a local Hermes install. Will follow in a separate PR.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 9cd7b74989

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread backend/app/api/agents.py
Comment on lines 788 to 791
raw_key = f"oc-{secrets.token_urlsafe(32)}"
agent.api_key = raw_key
agent.api_key_hash = hashlib.sha256(raw_key.encode()).hexdigest()
await db.commit()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Revoke active bridge sessions when rotating API keys

Updating api_key_hash here does not invalidate an already-connected bridge: /ws/bridge authenticates only once at connect time, and the in-memory dispatcher keeps serving that socket by agent_id afterward. In practice, rotating a compromised key still leaves the existing bridge fully authorized to run sessions until it disconnects, which defeats the expected revocation semantics of this endpoint.

Useful? React with 👍 / 👎.

Comment thread backend/app/api/agents.py
agent.api_key = raw_key
agent.api_key_hash = hashlib.sha256(raw_key.encode()).hexdigest()
agent.status = "idle"
agent.bridge_mode = "enabled"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Default new OpenClaw agents to fallback-capable bridge mode

New OpenClaw agents are forced to bridge_mode="enabled" at creation, but the chat websocket path rejects requests when no bridge is connected instead of queueing to gateway fallback (the fallback path is only used for auto/disabled). This makes fresh agents fail chat out of the box until operators manually install/connect a bridge, even though creation still returns API-key-based onboarding.

Useful? React with 👍 / 👎.

`/ws/bridge` authenticates only at the WebSocket upgrade — once a bridge
is connected the read loop never re-checks the token. Rotating the hash
alone left the old socket running sessions on a revoked credential until
it disconnected on its own. Codex P1 flagged this.

Fix:
- `SessionDispatcher.detach_bridge` gains `close_ws` + `reason` params.
  When set, the server closes the WS with code 4001 (same class as
  upgrade-time auth failure) before failing in-flight sessions. Existing
  call site in `/ws/bridge`'s read-loop finally keeps the no-close
  default, so no behavior change there.
- Rotate endpoint calls `detach_bridge(close_ws=True, reason="api_key_rotated")`
  after committing the new hash. Wrapped in try/except so an eviction
  failure does not mask the successful rotation from the operator.

Tests pin both the dispatcher contract (close code, reason, in-flight
session failure, no-op when unattached) and the integration (rotate
endpoint invokes detach_bridge with the right args, still succeeds if
eviction raises).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Agent.api_key_hash == api_key,

P1 Badge Authenticate against plaintext API key column

_get_agent_by_key’s “plaintext” branch compares the caller token to Agent.api_key_hash instead of Agent.api_key, which means a stored SHA-256 hash is accepted as a valid credential. In this codebase api_key_hash is returned in agent payloads (AgentOut), so any principal that can read an agent record can replay that hash to /gateway/* or /ws/bridge and fully impersonate the agent without the real key.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +492 to +493
await websocket.send_json(msg)
except Exception:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Cancel bridge session when chat websocket send fails

When the client socket drops mid-turn, _bridge_drain swallows the send_json failure and returns, but the outer handler still waits on _fut for up to 1800s and never cancels the dispatched local session. In that disconnect scenario, bridge work continues with no recipient, consuming bridge capacity and potentially applying side effects long after the user has gone away.

Useful? React with 👍 / 👎.

…ops crashing startup

When the host-side ./ss-nodes.json is absent, Docker auto-creates the
bind-mount source as a directory. The previous os.path.exists() check
passed, then open() raised IsADirectoryError — surfaced as
"Background task ss-local-proxy CRASHED" on every container start.

isfile() treats the directory as absent and takes the clean skip branch.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 618ec2176c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +338 to +342
if remaining <= 0:
yield SessionEvent(kind="stderr_chunk", payload={
"text": f"timeout after {timeout_s}s",
})
break
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Raise ACP timeout as terminal session failure

When timeout_s is exceeded, this branch only emits a stderr_chunk and breaks, so start_session returns normally; SessionManager._run then reports session.done with success semantics instead of session.error. In long-running OpenClaw ACP sessions, a true timeout is therefore misreported as a successful completion, which hides failures and can produce misleading downstream chat state.

Useful? React with 👍 / 👎.

Comment thread backend/app/api/agents.py
Comment on lines +875 to +881
existing_plaintext = getattr(agent, "api_key", None)
if existing_plaintext:
raw_key = existing_plaintext
needs_persist = False
else:
raw_key = f"oc-{secrets.token_urlsafe(32)}"
needs_persist = True
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Serialize legacy installer key minting per agent

This legacy-upgrade path mints a new key whenever api_key is null, but it does so before any lock/recheck; two concurrent POST /agents/{id}/bridge-installer calls can each bake different keys, and the later commit overwrites api_key/api_key_hash, invalidating the earlier downloaded installer immediately. This can happen with parallel retries/tabs/admin actions and causes flaky first-time bridge setup for pre-migration agents.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant