Skip to content

feat: operator rate control, pause-on-typing, quiet-peer liveness & forms-only protocol#40

Merged
obeone merged 12 commits into
mainfrom
feat/operator-controls-and-liveness
Jun 19, 2026
Merged

feat: operator rate control, pause-on-typing, quiet-peer liveness & forms-only protocol#40
obeone merged 12 commits into
mainfrom
feat/operator-controls-and-liveness

Conversation

@obeone

@obeone obeone commented Jun 18, 2026

Copy link
Copy Markdown
Owner

Operator controls & agent liveness

Five operator-facing improvements to the hub, plus one latent bug found along the way. No manual version bump — release-please owns the release from these conventional commits. PROTOCOL_VERSION is bumped 14 → 15 (independent of the package version).

What's in it

  • Fix the dead /ui control wire (found while scoping pause-on-typing): the dashboard sent {mode: action} but the hub only dispatches on {action}, so operator pause/resume/stop were silently dropped over the socket. Now sends {action}, pinned by a fails-before vitest guard + a hub-side coverage test.
  • Runtime rate-limit control — operator sets the global send rate (messages /s or /min) and burst from the UI. set_rate_limit retunes every live and reaped bucket in place (never reconstructs — that would reseed tokens to a full burst), validates first and is a strict no-op on reject. A reserved peer key is rejected today so a per-peer override can extend the wire later.
  • Pause while typing — opt-in toggle (off by default) that pauses the room the instant the operator types into an empty composer and auto-resumes on send. A four-phase state machine (idle → waiting → confirmed → cancelled) arms the auto-resume only after the hub echoes the pause, so it never fights a manual pause or another operator's mode change.
  • Quiet-peer detection — the hub flags a live, non-paused peer that has gone silent past a single threshold (QUIET_AFTER_SECONDS, default 180s, env-overridable) with both no /receive poll and no set_status. Surfaced as an amber "quiet" badge in the HealthPanel — advisory, not an error, since a peer mid-long-turn can legitimately be quiet. The threshold sits below the 300s reaper so a peer surfaces before it vanishes.
  • Regular signs of lifeset_status is now load-bearing: agents (especially ones peers wait on) report status between turns or surface as quiet. The HealthPanel shows each peer's status age and dims it when stale.
  • Protocol — agents talk to the human exclusively via ask_operator forms, and must signal in the room before any private exchange with the operator. PROTOCOL_TEXT amended, caucus-protocol.md mirror synced, drift-guarded by a shared canonical phrase.

Why pause/resume "never worked"

The wire mismatch was latent: control-mode over /ui has been a no-op the whole time (the REST /control path works). Pause-on-typing reuses that path, so the fix ships first as commit 1.

Verification

  • Python: 475 passed, ruff clean, mypy --strict clean (13 files).
  • Web: 136 passed (vitest), tsc -b strict + vite build clean; the UI bundle (src/caucus/ui/) is rebuilt and committed with each web change.
  • An adversarial multi-agent review (correctness / security / contract / pause-races / protocol) returned 0 confirmed findings.

Design decisions (locked with the requester)

  • Pause-on-typing auto-resumes on send.
  • Quiet detection is detection + alert only (no auto-nudge); status-age is the primary signal with a poll-age guard, single 180s threshold.
  • Rate scope is global for now (per-peer reserved).

obeone added 12 commits June 18, 2026 05:09
… reach the hub

The dashboard sent {mode: action} but the hub gates and dispatches /ui
control-mode on the "action" key (_MUTATING_COMMANDS / _apply_ui_command),
so the frame matched no command key and was silently dropped, leaving
operator pause/resume/stop/reset dead over the WebSocket. Send {action}.

Add a vitest guard pinning the corrected wire format (fails on the old
{mode} payload) and a hub-side test asserting an operator action:pause
actually flips the room mode.
HubState.set_rate_limit retunes the global send rate at runtime: it updates
the per-peer bucket defaults (so future registrations inherit them) and
reconfigures every live and reaped peer's TokenBucket in place. New
TokenBucket.reconfigure credits elapsed idle time under the old rate, adopts
the new params and clamps tokens down to the new capacity — mutating in
place rather than reconstructing, which would re-trigger __post_init__ and
silently reseed tokens to a full burst on each retune.

Validation is server-side and a strict no-op on rejection (refill_rate > 0,
capacity >= 1.0): an invalid frame leaves the defaults and every bucket
byte-identical. The current rate is pushed as a {type:rate} event and added
to the /ui snapshot so the dashboard can reflect and edit it.
_apply_ui_command now handles {set_rate:{refill_rate,capacity}} by calling
state.set_rate_limit, and set_rate joins _MUTATING_COMMANDS so an observer
connection is refused with the standard forbidden error. A reserved "peer"
key is rejected as a no-op today so the planned per-peer override extends
the wire contract instead of breaking it; malformed payloads are ignored.
peer_info now derives two signals from one configurable threshold so the
operator can see who has gone silent. A peer is "quiet" when it is live, not
operator-paused, and has passed QUIET_AFTER_SECONDS (default 180s, overridable
via CAUCUS_QUIET_AFTER_SECONDS) with BOTH no /receive poll AND no set_status
update — the poll-age guard means an actively polling peer is never flagged.
The threshold sits between a realistic single long turn (a passive bridge
fires no tool call mid-turn; a native does not poll while reasoning) and the
300s reaper, so a normal turn never trips it but a genuinely silent peer
surfaces before it is reaped.

status_stale is a looser cosmetic dim derived from the same tunable
(0.66x), so the operator holds a single number. Both ride the existing
peers_info path into the /ui health tick and snapshot; the snapshot also
carries the active quiet_after threshold.
…igns of life

Amend PROTOCOL_TEXT and bump PROTOCOL_VERSION 14 -> 15. Agents now talk to
the human exclusively through ask_operator forms — never a plain say() — and
must announce in the room before any private exchange with the operator, so
the room always knows a side conversation is happening even if it cannot see
it. Strengthen the set_status cadence into a load-bearing rule: a peer that
neither polls nor reports a status looks dead to the hub and surfaces as
"quiet" on the operator console, so agents (especially ones peers are waiting
on) must give regular signs of life via set_status between turns.

A reminder comment at PROTOCOL_VERSION points to the caucus-protocol.md
mirror so the two never drift.
Mirror the PROTOCOL_VERSION 15 amendment into the human-readable protocol
copy: add ask_operator/list_forms to the tool table, a new "Asking the human
(forms)" section stating forms are the only channel to the operator and the
signal-before-private rule, and a Discipline bullet on giving regular signs of
life via set_status before going "quiet". A test_protocol_md drift guard pins
the shared "taking this to the operator privately" sentence in both this file
and the hub's PROTOCOL_TEXT so they cannot diverge.
RateControl lets the operator retune the global send rate at runtime: a value
field, a per-second/per-minute unit toggle and a burst field that convert to
the hub's {set_rate:{refill_rate,capacity}} frame. The store gains a rate slice
(RateInfo, snapshot.rate, the {type:rate} event, and sendSetRate). The panel is
operator-only and notes that a change applies to all peers and clamps in-flight
bursts.
A "Pause while typing" toggle (off by default, persisted) pauses the room the
instant the operator starts typing into an empty composer and auto-resumes it
on send. A four-phase state machine (idle -> waiting -> confirmed -> cancelled)
arms the auto-resume only once the hub echoes the pause back, so it never
fights a manual pause and never spuriously resumes when another operator
changed the mode meanwhile. Clearing the box or turning the toggle off also
resumes; an inline hint shows while auto-paused.
PeerInfo gains quiet and status_stale, and the HealthPanel renders them: a
live peer that has gone silent (no poll and no status update past the
threshold) gets an amber "quiet" badge with an advisory tooltip — amber, not
red, because a peer mid-long-turn can legitimately be quiet — plus a "N quiet"
figure in the stats bar. Each peer's self-reported status now shows its age and
dims when stale. This gives the operator an at-a-glance read of who is alive
and what they are doing.
…s-and-liveness

# Conflicts:
#	caucus-protocol.md
#	src/caucus/hub.py
…ars the composer

Typing the leading "/" of a slash-command goes empty to non-empty, so with
"Pause while typing" on the composer arms an auto-pause. Accepting the command
from the autocomplete dropdown clears the box through executeCommand, which
bypassed handleChange's clear-box-resume branch — so the auto-pause state
machine leaked: a stale phase, a lingering hint, a spurious resume on the next
send, and for /export (which sets no mode of its own) a room left silently
paused with no box content left to clear to release it.

executeCommand now releases the transient typing-pause itself: it resumes for
/export, and for the mode-setting commands (pause/resume/stop/reset) it forgets
the auto-pause without a spurious resume so the command's own terminal mode
stays authoritative. Covered by two new composer tests driving the real
autocomplete-accept path.
set_rate_limit validated only the lower bounds (refill_rate > 0, capacity >= 1).
NaN already fails those comparisons, but +inf slips through (inf > 0, inf >= 1)
and is accepted, silently disabling the limiter — never a legitimate config for
an attacker-shaped /ui frame. Guard both operands with math.isfinite so the
reject stays a strict no-op. Extends the existing reject test with inf/nan.
@obeone obeone merged commit d8b7984 into main Jun 19, 2026
6 checks passed
@obeone obeone deleted the feat/operator-controls-and-liveness branch June 19, 2026 17:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant