feat(log-tail): event-driven log-tail worker; headroom overhead → telemetry#17
Merged
Conversation
…emetry Add `hex log-tail`, a generic event-driven log-tail daemon that captures an opaque upstream's signal into hex's unified telemetry + event bus, behind a boundary hex owns. First consumer: headroom proxy overhead (STAGE_TIMINGS.total_pre_upstream), so the "tolerate the stall" decision stays evidence-based. Design: the Rust worker SDK has no daemon primitive (handlers must return), so a continuous tail can't be a *.worker.rs cron handler without becoming a poll. This ships as an iii-exec-supervised CLI daemon (the headroom-proxy pattern), notify-driven (FSEvents/inotify/kqueue — no polling). Reusable seam: the only upstream-specific code is the `LineObserver` trait impl selected by --observer (a typed Rust trait, not a config DSL). New upstream = new impl + registry entry. `HeadroomStageTimings` is the first. - Generic engine: notify tail with inode+size rotation detection, byte-offset checkpoint in iii-state (no replay / no drop across restart), stall-episode coalescing, durable-first emit (telemetry SQLite first, bus best-effort — at-least-once; consumers idempotent on source/event/ts/duration_ms). - Pure cores (reader/coalesce/observer) unit-tested; the notify loop is thin glue. - Golden-fixture contract test fails loud if headroom's log format drifts. - engine-workers.example.yaml documents the headroom tailer stanza (disabled by default; the instance enables it in its own engine-workers.yaml). Hardened via a 4-dimension adversarial review (correctness, reliability, standards, maintainability) — fixed inode-blind rotation (silent data loss), episodes never flushing under steady traffic, quiet-ms busy-spin, swallowed state errors (S6), and an emit arg-transposition risk. Verified: cargo build + 437 lib tests pass (0 failed); rename-guard OK; release build green. total_pre_upstream confirmed in ms (avoided a 1000x unit bug). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Adds
hex log-tail— a generic, event-driven log-tail daemon that captures an opaque upstream's signal into hex's unified telemetry + event bus, behind a boundary hex owns. First consumer: headroom proxy overhead (STAGE_TIMINGS.total_pre_upstream), so the standing "tolerate the intermittent proxy stall" decision becomes evidence-based instead of a blind spot.Why this shape
The Rust worker SDK has no daemon primitive — handlers run on
spawn_blockingand must return. A continuoustail -Fin a*.worker.rscron handler would therefore degrade to a poll, which is exactly what we set out to avoid. So this ships as aniii-exec-supervised CLI daemon (the same pattern that runs the headroom proxy),notify-driven (FSEvents/inotify/kqueue — blocks on kernel FS events, no timer).Reusable seam (the generalizable core)
The only upstream-specific code is the
LineObservertrait impl selected by--observer— a typed Rust trait checked at compile time, not a runtime config DSL. A new upstream = a new impl + a registry entry.HeadroomStageTimingsis the first. Everything else (supervision, bus fan-out, telemetry store, offset checkpoint) rides hex's existing opinions.Design highlights
notifytail with inode + size rotation detection; byte-offset checkpoint in iii-state (no replay / no drop across restart); stall-episode coalescing (one event per event-loop stall, not per queued request); durable-first emit (telemetry SQLite first, bus best-effort).(source, event, ts, duration_ms).reader/coalesce/observer) are fully unit-tested; thenotifyloop is thin glue.engine-workers.example.yamldocuments the headroom tailer stanza (disabled by default; the instance enables it).Quality
Hardened by a 4-dimension adversarial review (correctness/concurrency, reliability/failure-modes, project-standards, maintainability). Fixed before merge: inode-blind rotation (silent data loss when the rotated file already grew past the old offset), episodes never flushing under steady traffic (the headline feature was defeated for the real consumer),
--quiet-ms 0busy-spin, swallowed iii-state errors (S6), watcher-disconnect exiting 0 (no respawn), and anemitargument-transposition risk.Verification
cargo build+ 437 lib tests pass (0 failed; 22 new for log_tail), release build green.total_pre_upstreamconfirmed in milliseconds against theheadroom_stage_timing_ms_*series (avoided a 1000× unit bug).Review findings
All adversarial-review findings were applied in this branch (none deferred). One documented deviation:
--eventkeeps a generic default (log.overhead) rather than the plan'sheadroom.overhead, since the worker is generic and the headroom deployment sets--eventexplicitly.🤖 Generated with Claude Code