Skip to content

feat(log-tail): event-driven log-tail worker; headroom overhead → telemetry#17

Merged
mrap merged 1 commit into
developfrom
feature/log-tail-worker
Jun 13, 2026
Merged

feat(log-tail): event-driven log-tail worker; headroom overhead → telemetry#17
mrap merged 1 commit into
developfrom
feature/log-tail-worker

Conversation

@mrap

@mrap mrap commented Jun 13, 2026

Copy link
Copy Markdown
Owner

What

Adds hex log-tail — a generic, event-driven log-tail daemon that captures an opaque upstream's signal into hex's unified telemetry + event bus, behind a boundary hex owns. First consumer: headroom proxy overhead (STAGE_TIMINGS.total_pre_upstream), so the standing "tolerate the intermittent proxy stall" decision becomes evidence-based instead of a blind spot.

Why this shape

The Rust worker SDK has no daemon primitive — handlers run on spawn_blocking and must return. A continuous tail -F in a *.worker.rs cron handler would therefore degrade to a poll, which is exactly what we set out to avoid. So this ships as an iii-exec-supervised CLI daemon (the same pattern that runs the headroom proxy), notify-driven (FSEvents/inotify/kqueue — blocks on kernel FS events, no timer).

Reusable seam (the generalizable core)

The only upstream-specific code is the LineObserver trait impl selected by --observer — a typed Rust trait checked at compile time, not a runtime config DSL. A new upstream = a new impl + a registry entry. HeadroomStageTimings is the first. Everything else (supervision, bus fan-out, telemetry store, offset checkpoint) rides hex's existing opinions.

Design highlights

  • Generic engine: notify tail with inode + size rotation detection; byte-offset checkpoint in iii-state (no replay / no drop across restart); stall-episode coalescing (one event per event-loop stall, not per queued request); durable-first emit (telemetry SQLite first, bus best-effort).
  • At-least-once delivery (checkpoint persists after emit): consumers must be idempotent on (source, event, ts, duration_ms).
  • Pure cores (reader/coalesce/observer) are fully unit-tested; the notify loop is thin glue.
  • Golden-fixture contract test fails loud if headroom's log format drifts — guards against silent telemetry death on upstream upgrade.
  • engine-workers.example.yaml documents the headroom tailer stanza (disabled by default; the instance enables it).

Quality

Hardened by a 4-dimension adversarial review (correctness/concurrency, reliability/failure-modes, project-standards, maintainability). Fixed before merge: inode-blind rotation (silent data loss when the rotated file already grew past the old offset), episodes never flushing under steady traffic (the headline feature was defeated for the real consumer), --quiet-ms 0 busy-spin, swallowed iii-state errors (S6), watcher-disconnect exiting 0 (no respawn), and an emit argument-transposition risk.

Verification

  • cargo build + 437 lib tests pass (0 failed; 22 new for log_tail), release build green.
  • legacy-rename-guard: OK.
  • total_pre_upstream confirmed in milliseconds against the headroom_stage_timing_ms_* series (avoided a 1000× unit bug).

Review findings

All adversarial-review findings were applied in this branch (none deferred). One documented deviation: --event keeps a generic default (log.overhead) rather than the plan's headroom.overhead, since the worker is generic and the headroom deployment sets --event explicitly.

🤖 Generated with Claude Code

…emetry

Add `hex log-tail`, a generic event-driven log-tail daemon that captures an
opaque upstream's signal into hex's unified telemetry + event bus, behind a
boundary hex owns. First consumer: headroom proxy overhead
(STAGE_TIMINGS.total_pre_upstream), so the "tolerate the stall" decision stays
evidence-based.

Design: the Rust worker SDK has no daemon primitive (handlers must return), so a
continuous tail can't be a *.worker.rs cron handler without becoming a poll. This
ships as an iii-exec-supervised CLI daemon (the headroom-proxy pattern),
notify-driven (FSEvents/inotify/kqueue — no polling).

Reusable seam: the only upstream-specific code is the `LineObserver` trait impl
selected by --observer (a typed Rust trait, not a config DSL). New upstream = new
impl + registry entry. `HeadroomStageTimings` is the first.

- Generic engine: notify tail with inode+size rotation detection, byte-offset
  checkpoint in iii-state (no replay / no drop across restart), stall-episode
  coalescing, durable-first emit (telemetry SQLite first, bus best-effort —
  at-least-once; consumers idempotent on source/event/ts/duration_ms).
- Pure cores (reader/coalesce/observer) unit-tested; the notify loop is thin glue.
- Golden-fixture contract test fails loud if headroom's log format drifts.
- engine-workers.example.yaml documents the headroom tailer stanza (disabled by
  default; the instance enables it in its own engine-workers.yaml).

Hardened via a 4-dimension adversarial review (correctness, reliability,
standards, maintainability) — fixed inode-blind rotation (silent data loss),
episodes never flushing under steady traffic, quiet-ms busy-spin, swallowed
state errors (S6), and an emit arg-transposition risk.

Verified: cargo build + 437 lib tests pass (0 failed); rename-guard OK; release
build green. total_pre_upstream confirmed in ms (avoided a 1000x unit bug).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@mrap mrap merged commit b86557b into develop Jun 13, 2026
5 checks passed
@mrap mrap deleted the feature/log-tail-worker branch June 13, 2026 08:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant