feat: Claude Code session-status dashboard — lifecycle fix, session detail/timeline/map, run grouping, quickstart by anshss · Pull Request #306 · Metabuilder-Labs/tokenjam

anshss · 2026-06-25T13:08:44Z

Turns the Status page from a flat agent list into a real session-status dashboard for Claude Code — fixing the root bug where live sessions showed as "completed", then building the drill-in surfaces (detail / timeline / map), cross-session run grouping, and a zero-install first run. Rebased onto and merged with current main (v0.5.2).

Summary

Root fix: Claude Code/Codex export OTLP logs (not traces); each user_prompt becomes a zero-duration invoke_agent span that ingest mistook for a session completion — so every live session force-completed on its first prompt. Completion is now gated on real duration (end_time > start_time); non-end spans re-activate mistakenly-completed sessions.
Session lifecycle: active / idle / stale / closed tiers (SessionRecord.status_at), close-signal endpoint + tj session-end, archive view, per-terminal tiles, display labels (service.instance.id + manual overrides), and a claude shell wrapper for per-terminal naming.
Project grouping: dashboard tiles roll up by OTel service.namespace, with a server-side [agents.<id>].project fallback so already-running sessions group without a restart.
Session drill-in: a per-session detail view with model-mix + context-growth, a deterministic Timeline (transcript play-by-play incl. recursive subagent logs, no LLM), and a graphical Map of what the agent did per ask.
Cross-session Run grouping: a fan-out harness stamps tokenjam.run_id / parent_session_id; tj groups spawned workers into one Run (/api/v1/runs), plus a setup_harness MCP tool + integration doc.
Sub-agent cost attribution: sub_agent_id on spans, per-subagent cost breakdown, and a subagent right-sizing analyzer.
Zero-install first run + Claude-Code wedge: tj quickstart (npx tj quickstart / uvx), tj context cost diagnostic, tj tokenmaxx quota card, retroactive Opus quota audit, branded home screen on bare tj.
Cost/pricing correctness: capture cache-creation tokens separately from reads, current Anthropic/Haiku-4.5 rates, framing block on /api/v1/sessions/{id}, bulk-insert backfill (perf).

Merge with `main` (v0.5.2)

This branch diverged at v0.5.0 and now merges +107 mainline commits (proxy/policy, Analytics, pricing overrides, Dashboard UI, request-capture #209, resume-dedup #294). Notable reconciliations:

DB migrations: kept main's released migrations 6 (policy) & 7 (request-capture) at their numbers; renumbered this branch's session migrations to 8–13 (safe — the runner tracks applied versions as a set).
backfill.py: combined main's dedup-by-message.id ([backfill] Claude Code backfill over-counts tokens/cost via resumed-session message duplication #294) with this branch's sub_agent_id + content-capture + cache-write tracking; kept the bulk-insert path.
UI: adopted main's Dashboard direction (Overview retired), grafting this branch's StatusView grouping/archived/clickable cards + Traces shimmer onto it, and porting the Release v0.1.5 #19 backfilled-DB empty-state fix into DashboardView.
Bare tj: adopted main's branded home screen (Onboarding first-run experience: welcome banner + 'next steps' nudge #240); tj quickstart stays the explicit zero-install command.

Tests / Verification

Full suite green: 1397 passed (pytest tests/unit tests/synthetic tests/agents tests/integration).
ruff check + mypy clean across touched files.
UI guarded by static-grep regression tests (test_lens_ui_regression.py) + offline test (test_ui_offline.py); module JS validated with node --check.
Diff scope: 88 files, +15,552 / −648 vs main.

What's NOT in this PR

Live OTel autopsy of subagent trees — CC's live OTLP export is a flat 2-level tree; deep subagent reconstruction is deterministic from the on-disk JSONL only (Timeline path), not the live span path.
LLM-generated intent summaries — the Timeline/Map are deterministic (no generation) by design.
Active-time (idle-segmented) durations — tiles still show wall-clock; idle-gap segmentation is deferred.
Drift on logs-path sessions — needs an idle-sweep to fire session-end hooks; deferred.

🤖 Generated with Claude Code

@watch

…sions Claude Code / Codex export telemetry as OTLP logs. routes/logs.py maps each user_prompt event to a zero-duration invoke_agent span (end_time == start_time) that marks the START of a turn. ingest.py treated any invoke_agent span with a truthy end_time as a session completion, so every live session was force- completed on its first prompt and the drift/alert session-end hooks fired on every turn. Result on the dashboard: an actively-running session displayed as "completed" with 0 duration and 0 tokens (the Status tile surfaced whichever empty marker session was newest), while real cost only showed in the daily roll-up. Fix: - ingest._is_session_end(): only treat an invoke_agent span as a completion when it has a real duration (end_time > start_time). The SDK @watch() path still completes correctly; the logs-path turn markers no longer do. - Ongoing-activity spans re-activate a session left "completed" (self-heals in-flight sessions). Idle sessions are surfaced as "stale" at read time via SessionRecord.effective_status (existing SESSION_STALE_THRESHOLD). - status route: pick the active session by most-recent activity (COALESCE(ended_at, started_at) DESC) and report effective_status, so a live session wins over a freshly-spawned empty marker. Tests: factory make_invoke_agent_span(); 4 lifecycle regression tests + 1 DuckDB-backed status-route test proving the live session reads as active. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

The Status dashboard only showed per-repo agent tiles (claude-code, claude-code-harness, claude-code-godmode...), because tj names each agent after the git repo (cmd_onboard._derive_project_name). A user working across one org's repos saw their work scattered across many tiles with no project identity. Add OTel service.namespace as a "project" grouping dimension: - onboard: derive the git org (_derive_org_name) and write service.namespace alongside service.name into the project's .claude/settings.json. New --project flag overrides it (e.g. org "aquanodeio" -> "aquanode"). - ingest: capture service.namespace from resource attrs on every path — logs (Claude Code / Codex), live OTLP spans, and the in-process SDK — and persist it on the SessionRecord (late-resolved like plan_tier). - db: migration 5 adds sessions.service_namespace (nullable); upsert + _row_to_session round-trip it. - status API: each agent now reports its namespace. - dashboard: tiles group under a project header showing agent count and rolled-up cost; agents with no namespace fall under "Ungrouped". Net effect: every Aquanodeio/* repo rolls up under one "aquanode" project tile, no per-repo config needed. Tests: factory service_namespace support; ingest capture + late-resolve; logs-path resource extraction; status-route end-to-end; org-name parsing (https/ssh); migration count. Dashboard grouping verified in-browser. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Two dashboard-accuracy fixes: 1. Duration: the Status tile picked the "latest completed" session by started_at, so a 40s fragment that started 3 min after the real session hid a 4.5-hour session that stayed active afterwards. Order get_completed_sessions by last activity (COALESCE(ended_at, started_at) DESC) instead. Fixes the tile across the status route, MCP, CLI status, and metrics — all of which take the limit=1 "current session". 2. Onboard now prompts for a project name (OTel service.namespace) with the git repo name as the default, instead of silently deriving the git org. A meta-repo (git repo "harness" holding all of "aquanode") can be named "aquanode" at onboard time. --project still skips the prompt for non-interactive use. Removes the now-unused _derive_org_name. Tests: ordering regression (long session vs later-started fragment); make_session gains started_at/ended_at overrides; onboard --project and interactive project-name prompt; existing onboard CLI tests pass --project. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

service.namespace rides in OTEL_RESOURCE_ATTRIBUTES, which an agent reads at startup — so an already-running Claude Code session can never start sending it without a restart. Add a server-side fallback: [agents.<id>].project maps an agent to a dashboard project, applied by tj regardless of what the wire carries. - config: AgentConfig.project (round-trips via _dc_to_dict / loader). - ingest: session.service_namespace falls back to the agent's configured project when the span carries none (so running sessions self-heal on their next span); an explicit wire namespace still wins. - status route: namespace falls back to the configured project at query time, so even already-completed sessions with NULL namespace group immediately — no backfill, no restart. - onboard: writes the chosen project into [agents.<id>].project too, so both the wire (future sessions) and server-side (running sessions) paths are set. Verified end-to-end: a session sending only service.name=claude-code-harness (no namespace) reports namespace=aquanode and its true 4h+ duration after just a daemon restart. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…inals) Several Claude Code terminals in one repo share a service.name, so they all report under one agent_id. The Status view collapsed an agent to a single representative session, hiding concurrent terminals. Now the status route emits one tile per recently-active session (last span within SESSION_STALE_THRESHOLD), falling back to the latest completed session when none are active. Each tile carries its own session_id, tokens, tool calls, duration, last-seen, per-session cost, and per-session alert count (when an agent has multiple). The dashboard renders a session-id line per tile and counts "N sessions" per project group (cost deduped by agent). Tests: concurrent-sessions-under-one-agent yields one tile each; updated the live-vs-marker test (both now show as their own tile). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Concurrent terminals show as opaque session-id tiles. Give them names two ways, resolved in priority order on the status route: 1. [session_labels] config override (full id or prefix match) — names an already-running terminal immediately, no relaunch. 2. OTel service.instance.id — durable per-terminal name set at launch via OTEL_RESOURCE_ATTRIBUTES; captured like service.namespace. 3. else None (UI falls back to the short session id). - config: TjConfig.session_labels dict ([session_labels] table). - service.instance.id captured across logs / OTLP / SDK paths onto SessionRecord.service_instance_id (migration 6); late-resolved. - status route: _session_label() resolves the name; each tile gets "label". - dashboard: tile header shows the label when present (agent id + short session id move to detail rows), else the agent id as before. Tests: instance.id capture; label priority (config > instance.id > none); session_labels round-trip; migration count 5->6. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…rd wrapper Add a `claude` shell wrapper (installed by `tj onboard --claude-code`) so each terminal tags its session with a distinct `service.instance.id`, rendering as separate dashboard tiles — without hand-editing shell rc and preserving the project's `service.name` / `service.namespace`. - New `tj otel-resource-attrs` command (cmd_otel.py): prints the project's OTel resource attrs on a single bare line (`service.name=claude-code-<repo>` plus `service.namespace=<project>` when the agent has a project configured). Registered in main.py and added to `no_db_commands`. - `tj onboard --claude-code` installs an idempotent `claude()` function into `~/.zshrc` (always) and `~/.bashrc` (when present), behind begin/end markers so re-onboards replace in place. The wrapper consumes `--as <name>`, derives the instance id (--as value, else tty basename, else `unknown`), exports `OTEL_RESOURCE_ATTRIBUTES`, and runs `command claude` to avoid recursion. Portable across zsh and bash. - Tests for both the new command (with/without configured project) and the wrapper install (presence, idempotency, bashrc-only-when-present). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…rchive Claude Code emits no "session closed" event, so tj can only see time-since-last-span. Model the lifecycle explicitly and let the `claude` wrapper report closes: - models: effective_status gains idle/closed tiers (active <5m, idle <4h, stale beyond); status_at(idle_threshold) keeps the property pure while the route honours configurable [sessions] idle_minutes (default 240). - config: TjConfig.session_idle_minutes round-trips via the [sessions] table. - db: close_sessions_by_instance / close_session_by_id (idempotent, only flips status='active' rows, bumps ended_at). - api: POST /api/v1/sessions/close (ingest-Bearer-protected) marks a terminal's active sessions closed; returns {"closed": n}. - cli: `tj session-end --instance/--session` POSTs best-effort over HTTP (no DB), exits 0 silently when the daemon is down. - wrapper: claude() reports session-end on return and via INT/TERM/HUP trap, preserving claude's exit status. Idempotent close makes double-fire safe. - onboard: stop writing OTEL_RESOURCE_ATTRIBUTES to project settings.json (the wrapper owns it per-terminal; settings env would override it and collapse tiles) and delete any pre-existing one to migrate older setups. - status route: only active+idle sessions become tiles (no completed/closed fallback), capped at 6 per agent with a surfaced overflow count; new `archived` list returns closed+stale sessions (cap 50). - ui: idle (amber) + closed styles, an Archived sessions table, and a "+N more" affordance for capped project groups. Gates: ruff clean, mypy clean, 639 passed. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Clicking a Status tile navigated to the global `#/traces` firehose, dropping the clicked session's identity and landing the user on an unfiltered, all-agents trace list. The header promised "details" but zoomed out. Add a real session-scoped destination instead. - GET /api/v1/sessions/{session_id}: per-session rollup (cost/tokens/tools/ alerts/drift) plus the session's own traces, so the UI can drill into the existing waterfall. Read-only; require_api_key; 404 as JSONResponse with response_model=None. Parameterised db.conn SQL only. - UI SessionDetailView + router case + click fix (index.html:680 now routes to `#/sessions/<id>`, archived rows clickable). Plan-tier-honest cost framing: "Implied API value" for subscription, "Local model — no API cost" for local, real cost for API; no invented spend, no fabricated agent tree (the live Claude Code telemetry is flat — documented in the route). - Stop tracking .tj/config.toml: every `tj` run from the repo cwd rewrites it and rotates the committed ingest_secret. The `.tj/` ignore rule already exists; this just drops the stale tracked copy. Tests: +5 integration tests (rollup/tools/traces, unknown→404, subscription plan_tier→pricing_mode, drift baseline, api-key auth). ruff + mypy clean, 644 pass. Layer 1 of the run-autopsy arc; subagent-tree capture at ingest (Layer 2) is the next step. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…view Layer 2 of the run-autopsy arc. The session detail view now shows how a session split across models and how its context grew over the run — both derived from existing gen_ai.llm.call spans, so it works on every session (live + backfilled) with no schema change. - GET /api/v1/sessions/{session_id} gains turn_count, model_mix (per-model calls/tokens/cost rollup) and context_series (time-ordered input-token series, downsampled to <=120 points, first+last preserved). Parameterised db.conn SQL; span name from the GenAIAttributes semconv constant. - UI "Models & context" section: model-mix table + a dependency-free CSS bar chart of input (context) tokens per LLM call. Descriptive only — no model- routing-quality claims; subscription caveat reused on cost. Tests: +3 integration tests (model_mix aggregation/order, turn_count, context_series ordering + downsample cap). ruff + mypy clean, 647 pass. Deferred (recorded in .plans): parentUuid within-session branch tree; the cross-session spawn graph (needs a harness-emitted spawn marker — no clean parent->child link exists in on-disk Claude Code data). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…unit Layer 3 of the run-autopsy arc. A fan-out harness (e.g. the meta-repo governor) stamps tokenjam.run_id (and optional tokenjam.parent_session_id) as OTel resource attributes on each worker session it spawns; tj groups those sessions into one Run. Linkage is DECLARED by the spawner — Claude Code OTLP carries no native parent<->child edge, so it is never reverse-engineered. - semconv: TjAttributes.RUN_ID / PARENT_SESSION_ID. - SessionRecord gains run_id + parent_session_id; migration 7 adds the columns (ADD COLUMN IF NOT EXISTS — fresh-DB and upgrade safe). - Ingest captures the markers from resource attributes on both paths (otel/otlp_parsing.py for the spans/OTLP path, api/routes/logs.py for the Claude Code logs path) and self-heals null-on-update (never overwrites). - API: run_id/parent_session_id on GET /api/v1/sessions/{id}; new GET /api/v1/runs/{run_id} (totals + member sessions + parent-edge tree) and a GET /api/v1/runs index. 404 as JSONResponse + response_model=None. - UI: RunDetailView (#/runs/<id>) — run rollup + sessions indented by spawn parent; "Run" link on the session detail. Plan-tier-honest cost framing (mixed -> implied API value, never a hard spend claim). Tests: +7 integration tests (logs+spans ingest capture, session-detail exposure, run grouping/aggregation/tree, unknown->404). ruff + mypy clean, 654 pass. Harness side is a documented contract (set the resource attrs at spawn), applied separately in the user's governor. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

L3 added tokenjam.run_id / tokenjam.parent_session_id resource-attribute capture to the HTTP/OTLP (otlp_parsing.py) and Claude Code logs (logs.py) paths but missed convert_otel_span — so a Python SDK app using the in-process exporter would never join a Run. Extract the markers there too, mirroring the existing service.namespace / service.instance.id extraction, so run grouping works across all three ingest paths (CC logs, HTTP/OTLP incl. the TS SDK, and the in-process Python SDK). +2 unit tests for convert_otel_span resource extraction (also the first direct coverage of that block). ruff + mypy clean, 656 pass. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

A "Story" section in the session view that explains, step by step, what a Claude Code session was trying to do and how it went — surfacing the agent's own narration threaded with its literal tool calls and ok/error outcomes. No LLM, no generation: read live from the on-disk CC JSONL transcript, nothing stored in the DB (capture posture unchanged). - core/transcript.py: build_session_story() locates ~/.claude/projects/*/<session_id>.jsonl (session_id == transcript filename, verified 100% across cli + sdk-cli), parses task / steps (narration + per-tool {name, label, status} + is_error/is_retry flags) / outcome. Caps + truncation; NEVER returns full tool inputs/outputs — only a short arg label + ok/error (privacy + bounded payload). - GET /api/v1/sessions/{id}/story (require_api_key, response_model=None); {available: false} at HTTP 200 for SDK/no-transcript sessions. Projects root overridable via app.state / TJ_CLAUDE_PROJECTS_ROOT for tests. - UI Story section: Task callout, step list (expandable narration, tool chips, error tint, ↻ retry marker, omitted markers), Outcome callout. Dependency-free. CC-only by design (SDK sessions have no CC JSONL → graceful unavailable state). +15 tests (unit parser + API available/unavailable). ruff + mypy clean, 671 pass. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…Alerts / Traces) The session card stacked six sections vertically — too much scroll. Split the lower content into tabs while pinning the header + Overview/Cost summary cards at top. Story is the default tab; Tools folds under "Models & context", Behavioral drift under "Alerts". Pure layout change (htm/Preact + CSS), no backend or data-flow change; data still loads as before, only display is gated by the active tab. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

StatusView's empty-state guard only checked data.agents (active + idle), so once all sessions aged past the 4h idle window into `archived`, the dashboard showed the cold-start "No agents found / pip install" screen despite having 50 archived sessions to display. Guard now also requires `archived` to be empty before showing the cold-start state, so the (clickable) archived-sessions table renders whenever there is any history. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Each Agent/Task step in a session's Story now expands into that subagent's own story (task -> steps -> outcome), recursively, so a session's full log includes the work of everything it spawned. Subagent transcripts are read from ~/.claude/projects/<proj>/<session_id>/subagents/agent-<agentId>.jsonl, linked by the agentId carried in the parent step's tool_result (exact match, no heuristic). - core/transcript.py: include_subagents (default true) resolves each Agent/Task step's child agentId from its tool_result, loads agent-<id>.jsonl from the root session's subagents/ dir, and nests its story under the step (step.subagent), recursively. Guards: MAX_SUBAGENT_DEPTH, a shared step-budget across the tree, and an agentId cycle-set; depth/budget caps are surfaced, never silent-dropped. Privacy unchanged at every depth (narration + short tool label + ok/error only). - GET /sessions/{id}/story: Agent steps carry a recursive `subagent`; ?subagents=false returns the flat single-session story. - UI: collapsed "> subagent: <name> - N steps" disclosure under each Agent step; expands to the subagent's task/steps/outcome indented, recursively (its own Agent steps expand too). +9 tests (parent->child->grandchild nesting, agentId resolution, depth/budget caps, cycle guard, ?subagents=false). ruff + mypy clean, 680 pass. Real-data: this session nests 12/12 spawned subagents. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

The session activity section was a raw wall of full-card steps (a 356-step session rendered ~32k px tall). Make it scannable and address the naming/order feedback. - Rename the user-facing tab + section "Story" -> "Timeline" (the /story endpoint path is unchanged/internal). - Newest-first display: renderStepsNewestFirst() reverses the step list (top level AND nested subagents) while keeping the #n labels (so Metabuilder-Labs#1 is still the first action). The Task callout stays pinned at top and Outcome at bottom; only the steps reverse. - Compact rows: each step is now a one-line row (#n - time - tool chips ok/error - clamped first line of narration); click to expand the full narration + detail. The repetitive per-row model label is dropped and shown only when the model changes (prevModel). Error tint, retry markers, and the recursive subagent disclosure are preserved. UI-only (tokenjam/ui/index.html); ruff clean. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

The session-detail Timeline (StorySection) fetched /sessions/:id/story once on mount with no polling, unlike every other view. A live session's Timeline froze at page-load and only refreshed on remount (tab switch / re-navigate). Add setInterval(load, 10000) matching the house idiom, preserving the last-good Timeline on transient poll failures. Apply the same polling fix to TraceDetailView, which had the identical once-only fetch (selection is by span_id, so re-fetching spans preserves it). Also remove the pinned top-level Task and Outcome callouts: in a long-running session the first prompt goes stale as new prompts are sent, and Outcome was just the last assistant message (already the top step) mislabeled as a final outcome on a live session. The steps list already shows everything in descending time order. Per-subagent Task/Outcome callouts stay (scoped, not stale). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Each Timeline step's text was hard-trimmed to 400 chars server-side, with the full narration discarded. The UI's expand-a-step feature could therefore only ever reveal 400 chars ending in "…" — "show more" was lying, since the rest was never sent. Raise MAX_STEP_TEXT_CHARS to 100K so it acts as a safety guard against a pathological single blob rather than a preview trim; the UI already shows only the first line collapsed and the full text when expanded. Real assistant responses now render complete. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

ended_at is a session's last-activity time, shown as "Last seen" in the UI. close_session(s) advanced it to the close moment (CASE ... ended_at < now), so a session closed days after its last span reported "Last seen 2h ago" and a multi-day duration, while its actual last telemetry/transcript message was days earlier. Closing is an admin action, not telemetry. - Both close methods now use ended_at = COALESCE(ended_at, now): stamp only when NULL, never advance a real last-span time. - Migration 8 repairs already-closed rows: recompute ended_at from each closed session's max span time, lowering only (idempotent, leaves consistent rows). Tests: close preserves ended_at / stamps when NULL (integration); backfill lowers bumped rows, leaves consistent rows, ignores active sessions (unit). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Two status-page UX fixes: - Scroll memory: returning from a session/run detail page (e.g. clicking an archived session, then Back) landed at the top of the Status page instead of where you were. The window scrolls (.main#app has no own overflow) and the async list load defeats native scroll restoration. Add a useScrollMemory hook that saves window scroll per view and restores it once the list has rendered. - "restore session" button on every archived (closed/stale) and idle session. Clicking copies `claude --resume <session-id>` to the clipboard so the session can be picked back up in the terminal. Copy uses execCommand inside the click gesture (reliable even when the async Clipboard API hangs on a permission prompt) with a best-effort navigator.clipboard upgrade, and stopPropagation so it doesn't trigger the row/tile navigation. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

The "restore session" button belongs on the session detail page (where you're looking at one archived/idle session), not on the Status list. Move it: render it under the session title in SessionDetailView when the session is not active (closed / stale / idle), and remove it from the Status page archived table and idle tiles. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…g row Move the restore button onto the title row, right-aligned next to the session heading, instead of stacked under the Session id line. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Add a copy glyph to the restore-session button and a styled on-hover tooltip that tells you to paste & run the command in your terminal, showing the exact `claude --resume <id>` line. Replaces the plain native title tooltip. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Claude Code (and the OTel/SDK paths) emit cache_creation_tokens, but every ingest parser dropped them — only cache_read landed in the single cache_tokens field. The dashboard's "Cache tokens" therefore understated total cache usage by the cache-write amount, and cache-write tokens were never priced. Add a first-class cache_creation_tokens field through the stack while keeping cache_tokens = cache reads (analyzers price it at the cache-read rate): - models: cache_creation_tokens on NormalizedSpan + SessionRecord - db: migration 9 adds the column to spans + sessions; wired through insert/upsert/row-mappers - ingest: aggregate cache_creation_tokens per session - cost: CostEngine now prices cache writes at the cache-write rate - ingest paths: logs.py (Claude Code), otlp_parsing.py, provider.py; backfill realigned to the same convention (was the lone outlier) - api/dashboard: expose cache_creation_tokens (sessions/runs/traces); "Cache tokens" now renders reads + writes Tests: +2 regression tests; updated tests asserting the old behavior. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Reconcile the branch's session-detail / dashboard / transcript / run-grouping work with 98 commits from main. Notable conflict resolutions: - Cache read/write split: main shipped the same fix as `cache_write_tokens` while this branch used `cache_creation_tokens`. Converged on main's `cache_write_tokens` everywhere (models, db, ingest, cost, logs, otlp, provider, backfill, api routes, ui). Kept the branch's session-level aggregation + dashboard display (main only tracked it at span level). - Migration number collision: both branches added migration 5+. Kept main's released migration 5 (cache_write on spans), renumbered the branch's to 6-9, and added migration 10 for session-level cache_write (with a defensive spans IF NOT EXISTS to repair DBs that consumed v5 as the namespace column). - backfill: kept the branch's read/write split (more correct than main, which still summed both into cache_tokens), under the cache_write_tokens name. - dashboard: took main's offline-vendored preact/uplot imports + optimize route + params plumbing; kept the branch's StatusView (project grouping, archived sessions, click-to-detail), SessionDetailView, and run routes. "Cache tokens" now shows reads + writes. - tests: unioned test_cli additions from both sides; kept the branch's dynamic migration-count assertions in test_db. Verified: 836 tests pass, ruff + mypy clean, dashboard loads headless with no console errors and renders the correct cache-token total. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Claude Code records each Task-tool subagent's turns in <session>/subagents/agent-<id>.jsonl, tagged with the parent's sessionId. Backfill folded those spans under the parent session but discarded the subagent identity, so a session's cost could not be broken down per subagent -- yet a single research run can spawn 100+ subagents that drive most of the spend (verified on a real session: 66% of $642 across ~147 subagents, previously invisible inside one parent total). - NormalizedSpan.sub_agent_id + spans.sub_agent_id column (migration 11) - backfill sets it from a record's top-level agentId when isSidechain is true; None on the main thread - both span write paths (db.insert_span, backfill) + _row_to_span carry it - make_llm_span factory gains a sub_agent_id arg Enables GROUP BY sub_agent_id for per-subagent breakdown / right-sizing. Known limitation (separate latent bug, not addressed here): backfill upserts the session row once per file with replace semantics, so sessions.total_cost_usd reflects only the last-processed subagent file. Span-derived cost (get_session_cost / get_cost_summary) stays correct. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

A Claude Code session is split across files sharing one session_id (main thread + subagents/agent-*.jsonl). Backfill upserts the session row once per file, and upsert_session uses replace semantics, so the row ended up holding only the last-processed subagent file's totals instead of the sum -- sessions.total_cost_usd and token counts were wrong for any multi-agent session. After ingesting, reconcile each touched session row's token + cost aggregates to SUM over its spans (the source of truth), via the new DuckDBBackend.recompute_session_totals_from_spans. Scoped to the sessions backfill actually touched, so live-ingested rows are untouched. Idempotent: re-running backfill also repairs rows written by an earlier (pre-fix) run. Span-derived cost (get_session_cost / get_cost_summary) was already correct; this fixes the stored sessions.* aggregates. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

New optimize analyzer that breaks a window's cost down per subagent (sub_agent_id) and flags structural right-sizing candidates: - over_powered: premium (Opus-tier) model, little output, few tool calls - over_provisioned: large context (input + cache) but little output Honesty discipline (CLAUDE.md Rule 14): candidate flags only, never a quality claim; the caveat is surfaced verbatim and the recoverable estimate is left None (we report the spend concentrated in flagged subagents, not a guaranteed saving). Registered as "subagent" (auto-discovered; appended to ANALYZER_ORDER), so it flows through get_optimize_report (MCP) and /api/v1/optimize via a dict round-trip constructor, with a pricing-mode-aware CLI renderer. Verified on a real session: 147 subagents = 66% of a $642 window, of which 25 are flagged ($109) -- Opus subagents fed 600K-1M cache tokens that produced <1K output (over_provisioned). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Backfill is idempotent by span_id, so a plain re-run skips spans already in the DB and never populates sub_agent_id on history ingested before that column existed. --reingest UPDATEs the existing spans in place (sub_agent_id refreshed) instead of skipping them -- no new rows, no duplicates -- so accumulated history becomes attributable per subagent. Surfaced via the new BackfillResult.spans_retagged counter and the CLI summary. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Make run linkage work for any fan-out harness, not just ones that announce a run id. A harness groups its sessions by stamping the OTel resource attribute tokenjam.run_id on the launcher + every worker; tj then groups them automatically. - core/harness_setup.py: generates the drop-in run-env helper (mints one run id per launch, exports tokenjam.run_id), the Python equivalent, and a bounded, prioritized scan for a repo's spawn points (harness-like files first so a big monorepo doesn't exhaust the budget; relative-path ranking; narrow regex to avoid false-positive floods). - mcp/server.py: setup_harness(mode="instrument"|"map") — instrument writes the helper + reports spawn points and exact wiring for Claude to apply (never blind-edits harness code); map makes no changes and reports how tj already groups the harness's sessions. Verified against the real aquanode governor. - docs/harness-integration.md: the three-tier contract (automatic Task subagents / tokenjam.run_id convention / best-effort inference) + the honest boundary; linked from the README. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_017qQKa76DCYkF68NfeNXtPw

…ration)" A pure tool turn (the model fired a tool with no text that turn) rendered as a dim "(no narration)" — you had to expand the row to see what it did. Surface the tool's label/command inline instead (Bash command, Read/Edit file path, etc.), monospace, for any tool. The inline chips still carry the tool name + ok/error status; expand still shows the untruncated detail. Faithful: narration, when present, still wins. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_017qQKa76DCYkF68NfeNXtPw

A failed tool step was boxed in red but gave no reason — you had to open the transcript to learn why. Capture the tool_result's error text (wrapper-stripped, length-capped) on the step and surface it in the expanded body. Remove the red border around the row; the ✗ chip plus the expanded error block carry the signal. Descriptive only — the message is the transcript's verbatim text. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_017qQKa76DCYkF68NfeNXtPw

Pressing "Distill titles" worked but felt broken: it reverted on reload, gave weak feedback during the ~10s call, and showed "couldn't reach claude" even when it simply had nothing to distill. - Auto-apply on load: a cache-only peek (core.distill.peek_cached_titles, never calls claude) re-applies an already-distilled session, so a press sticks across reloads/navigation at zero cost. The /distill route gains ?cached_only. - Honest note: the route returns candidate_count so the UI tells "nothing long enough to distill" (success) apart from "claude unreachable" (failure); button shows Distill titles / Distilling… / Distilled ✓. - Feedback: a brief highlight flash on the headlines when a distill applies. Verified live: cached titles auto-apply on load with no claude call; manual distill flashes + updates the note. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_017qQKa76DCYkF68NfeNXtPw

…ration) 11 days of Claude Code use produced 14,846 alerts, 96.6% of them false: retry_loop fired on normal repeated tool use because it keyed on tool NAME only (4 different Bash/Read/WebSearch calls in 6 spans), counted non-execution tool_decision event spans, and CC OTLP carries no tool arguments to tell an identical repeat apart. drift treated heterogeneous interactive sessions as one baseline (every longer answer "drifts"); session_duration flagged hour-long governor runs against a 3600s SDK-era default. - retry_loop now fires only on a GENUINE loop: same tool + identical arguments, counting only gen_ai.tool.call spans. A privacy-safe argument signature (tokenjam.tool_arg_sig) is computed at ingest before the raw tool input is stripped, so it works under any capture config; telemetry with no args (CC over OTLP) yields no signature and never trips this. Genuine CC retries stay visible in the Map/Timeline (transcript-derived is_retry). - drift + session_duration are skipped for interactive coding agents (claude-code* / codex*) where they're structurally meaningless; fully active for SDK / production agents with stable workloads. - shared utils/signatures.tool_arg_signature; make_tool_span gains tool_input/ name; the retry-loop demo models a real identical-args loop. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_017qQKa76DCYkF68NfeNXtPw

failure_rate detection is correct (it caught real rate-limits, context-window overflows, timeouts, server errors), but it re-fired on every additional error past the threshold — one struggling session emitted ~8 near-identical alerts (5/20, 6/20, 7/20 …), inflating a few incidents into dozens of rows. Track sessions that have already alerted and fire at most once per session. Replaces the oscillation-guard heuristic with a clean per-session latch. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_017qQKa76DCYkF68NfeNXtPw

Reconcile the session-status redesign (per-session tiles + archive, SessionDetailView, scroll memory, RestoreButton, work map) with main's framing rollout (Metabuilder-Labs#191), active-compute-time (Metabuilder-Labs#147), backfill plan-tier propagation (Metabuilder-Labs#176), and the upsert never-clobber-known-tier CASE. Conflict resolutions: - status.py: union imports; keep tile/archive structure, compute active_seconds per tile, keep the framing block. - db.py: combine main's plan_tier CASE with the branch's new service_namespace/instance/run/parent column updates. - backfill.py: union reingest re-tagging with config->plan_tier. - cmd_backfill/cmd_onboard: union new args / restart banner + tile guidance. - index.html: keep both branches' helpers; route SessionDetailView cost cells through fmtFramedDollar and relabel Duration -> Elapsed so the merged framing regression tests hold; agent tile uses framed cost_today. - tests: concatenate both UI regression sets; update two main tests to the branch's new contracts (active-session tile, ParsedSession cache-write arg). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_017qQKa76DCYkF68NfeNXtPw

… flags (Metabuilder-Labs#3) The Claude Code JSONL backfill parser recomputed cost from token/model metadata but never extracted per-message content or tool-input from the transcript — even though that data is present and core/transcript.py already reads it. Because backfill bypasses IngestPipeline/strip_captured_content, the [capture] toggles were a no-op on the CC path: flipping capture.prompts changed nothing. This blocked the context-cost diagnostic (Metabuilder-Labs#4), which needs per-message content + inclusion args to attribute tokens. parse_claude_code_session now accepts a CaptureConfig and, gated per toggle, populates the assistant LLM span with gen_ai.prompt.content (the triggering human prompt) and gen_ai.completion.content (the agent narration), and each tool span with gen_ai.tool.input (the raw args). The keys match GenAIAttributes so downstream consumers and alert content-stripping treat backfilled content identically to live content. ingest_claude_code reads config.capture and forwards it; iter_claude_code_sessions threads it through. Extraction is strictly opt-in: the None default and the all-False CaptureConfig default leave every span's attributes byte-for-byte unchanged ({"source": ...}), so a default backfill is unchanged and stays 100% local. Reuses _block_text from core/transcript.py for the record-walking. Co-Authored-By: Claude <noreply@anthropic.com>

Metabuilder-Labs#4) Claude Code's built-ins leave a real, documented gap: /compact is reactive, lossy and single-session; /context shows current-session totals only. Neither attributes WHAT is burning quota across sessions nor suggests a structural fix. A whole DIY ecosystem (ccusage, codeburn, context-analyzer, session-recall, ...) has sprung up to fill it — the strongest revealed-demand signal. Proof point: anthropics/claude-code#24147, where a dev hand-parsed 30 days of JSONL to find CLAUDE.md re-reads consumed 99.93% of their quota. `tj context` runs a local diagnostic over Claude Code sessions and reports: 1. Per-turn context composition — what share of each turn went to RE-READING prior context (cache-read tokens: conversation history, CLAUDE.md, accrued tool output) vs. NET-NEW WORK (uncached input + output), with the re-read overhead named. 2. Recurring inclusions — the same file re-read across many sessions, frequency-counted, each with a concrete `@file` / CLAUDE.md structural fix (capture-gated on `[capture] tool_inputs`). 3. Compact candidates — sessions whose accumulated re-reading makes a mid-session /compact reclaim the most quota. Framing is quota-native (the subscription majority): headline numbers render as token-share / "% of cycle tokens" via core/framing.py — the single source of truth for plan-tier-aware rendering — for Pro/Max users. Dollars are a SECONDARY calibration signal for API users, never the headline. Output is a screenshottable terminal card; `--json` for machine-readable output. Honesty discipline (CLAUDE.md Rule 14): every figure is a measured token share or a structural candidate flag, never a guaranteed saving; re-read tokens are cache reads (billed at a reduced rate, not free) — real quota, stated as such. Needs a direct DuckDB connection (reads the raw `attributes` column, which the API shim doesn't expose) — fails gracefully with a `tj stop` hint when the daemon holds the lock, mirroring `tj report --trim`. Scoped OUT of v1 (named, not faked): MCP schema-injection attribution and prompt-cache-miss attribution aren't derivable from current backfill data; recurring-inclusion detection covers file Reads only. Per the Metabuilder-Labs#3 handoff, content lands only on freshly-ingested spans after enabling `[capture]` — the command surfaces that nudge rather than silently showing empty recurring data. Covered by tests/unit/test_context_diagnostic.py over a synthetic multi-session fixture: per-turn re-read-vs-work composition, cross-session recurring-inclusion detection with structural fix, compact-candidate detection, the capture-off nudge, and end-to-end CLI rendering of quota-share for a Max plan. Co-Authored-By: Claude <noreply@anthropic.com>

…abuilder-Labs#6) Distribution research (adoption.md) is blunt: 15-second-TTFV `npx` tools dramatically out-adopt 2-minute-install tools, and every high-star CLI in this niche has sub-30s time-to-first-value. TokenJam's `pipx install` + launchd/systemd daemon + onboarding is great for power users but measurably higher friction than `npx ccusage` for the Claude Code crowd — who reach for `npx` first and whose JSONL tj reads is the very same file ccusage parses. This ships a no-setup first-run path so a user with zero prior setup runs ONE command and sees where their Claude Code quota actually goes in well under 30s. - `tj quickstart` (cmd_quickstart.py): opens a transient InMemoryBackend (nothing written to ~/.tj, no config read/written, no daemon started or contacted), backfills ~/.claude/projects/*.jsonl into it, then renders quota composition (reusing Metabuilder-Labs#4's context_diagnostic engine) + a session timeline (new core/session_timeline.py). Output leads with ccusage-parity framing. Registered in `no_db_commands` so the CLI never opens the on-disk DB / trips the daemon lock for it. - Bare `tj` (no subcommand) routes to quickstart — the group is now `invoke_without_command=True`, so `uvx --from tokenjam tj` is one command. `--help`/`--version` are eager and still short-circuit. - `npm-wrapper/`: a dependency-free npm package named `tj` so `npx tj` works — a thin launcher that shells out to the Python CLI via the first available runner (uvx -> pipx run -> installed tj) and passes args through. NOT published here (build + document only). - Docs: README + docs/installation.md lead with `npx tj` / `uvx --from tokenjam tj` as the primary starting point; `tj onboard` stays the opt-in "go deeper" (daemon/MCP/live) path. Honesty discipline (Rule 14) preserved: every figure is a measured token share re-derived from the JSONL, never a projected saving. Tests: tests/unit/test_quickstart.py covers the timeline core, the CLI render, the JSON output, the no-logs branch, and that quickstart + bare `tj` never open the on-disk DB (open_db patched to raise). Validated with `make all` — ruff + mypy clean, full suite 1171 passed. `uvx --from <repo> tj` and the npm wrapper passthrough were exercised locally end-to-end. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Every other dollar-bearing read route (/cost, /optimize, /budget, /traces, /status) returns a `framing` block (core/framing.py, Metabuilder-Labs#191), but the session detail endpoint did not. During the origin/main merge SessionDetailView's Overview / subagents / traces cost cells were routed through fmtFramedDollar(..., framing) to satisfy main's global framing regression, yet framing resolved to null there — so fmtFramedDollar silently fell back to raw fmtCost and subscription users (e.g. Claude Max) saw verbatim dollars on the session detail page that are suppressed everywhere else. Compute a framing block in get_session_detail via compute_framing (the single source of truth) with a window-independent plan mix scoped to the session's agent (plan_determination_mix, mirroring /status and /traces), passing the session's own token total + cost as window totals so the subscription token-share denominator is present. No UI change needed — SessionDetailView already reads data.framing. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Add `tj quota-audit` — the accountability companion to opusplan / `/model`. Those are forward-looking and say nothing about session history; nothing answered the backward-looking question "which of my PAST Opus sessions were Sonnet-shaped?". This runs the structural downsize heuristic retroactively, scoped to Opus sessions, and reports: - the headline "% of your Opus quota reclaimable from Sonnet-shaped sessions" computed as candidate-Opus-tokens / total-Opus-tokens — quota language, not dollars (the subscription majority is on a flat fee, so dollar framing mis-targets them; implied dollars are a secondary API-only calibration line); - the specific example sessions to spot-check; - an optional tuned routing-config export (`--export-config claude-code`, reusing the existing snippet generator via a DowngradeFinding shim). Framed as an audit (quota terms) with the honesty caveat always visible — "candidates to spot-check, never safe-to-downgrade". Computes purely from already-backfilled token/model metadata; does not depend on captured content. `audit_opus_quota()` + `OpusQuotaAudit` / `OpusAuditExample` extend model_downgrade.py / types.py; framing reuses core/framing (Metabuilder-Labs#4 pattern, window-independent plan mix per Metabuilder-Labs#177). Covered by a synthetic-mix test (Sonnet-shaped Opus candidates vs Opus-shaped Opus vs excluded Sonnet sessions) asserting the % reclaimable, the spot-check listing, and the caveat render. Co-Authored-By: Claude <noreply@anthropic.com>

…y card The "tokenmaxxing → tokenminimizing" cultural shift (June 2026) means a spend-brag card now repels while an efficiency/quota card aligns, and the subscription majority has no dollar spend to brag about at all. The old card rendered an (ironic) spend-tier ladder — the wrong polarity. Reframe the card to a quota/efficiency artifact: - Lead with the context-COMPOSITION headline from Metabuilder-Labs#4's compute_context_diagnostic: what share of quota went to overhead (re-reading history / CLAUDE.md / tool output) vs real work (uncached input + output). - Classify into 5 efficiency tiers keyed on the overhead share (lower = leaner = a better tier), replacing the 6-tier spend ladder. - Render quota-native via core/framing, mirroring Metabuilder-Labs#5's polarity: headline is a token-share / "% of cycle tokens"; dollars are demoted to a secondary "Implied API value" line shown only for api plans, suppressed for subscription / local / unknown. No dollar spend-brag for subscription users. - Frame the action line as reclaimable quota (points at tj context). - Add a --weekly "Quota Wrapped" recap preset (7-day window + recap copy). The card now needs a direct DuckDB connection (it reads context composition), matching tj context / tj quota-audit. Tests rewritten to assert the efficiency framing (composition + reclaimed, not spend-brag) and quota-native suppression of dollars for a subscription plan, plus weekly mode and the api secondary line. Updated the two forward-looking release checklists and CLAUDE.md to match. Co-Authored-By: Claude <noreply@anthropic.com>

…bs#8) Real Claude Code backfills carry the current Anthropic model ids, but `claude-fable-5` and `claude-sonnet-4-5` had no entry in the packaged pricing table. `get_rates` returned None for them, so `calculate_cost` fell back to the $0.50/$2.00-per-MTok default and logged a "No pricing data" warning — silently mis-pricing the secondary API-plan "implied value / calibration" dollar line in `tj context`, `tj quota-audit`, and `tj tokenmaxx`. Surfaced while building Metabuilder-Labs#6. Add both entries with the published per-MTok rates (https://platform.claude.com/docs/en/pricing): - claude-fable-5: input 10.00 / output 50.00 / cache-read 1.00 / cache-write(5m) 12.50 - claude-sonnet-4-5: input 3.00 / output 15.00 / cache-read 0.30 / cache-write(5m) 3.75 (same published tier as Sonnet 4.6) The other current ids the backfill warns about — claude-opus-4-8, claude-sonnet-4-6, claude-haiku-4-5 — were already present. Extend tests/unit/test_pricing_override.py to assert each current Anthropic model resolves to its packaged rate, that none falls through to the default fallback, and that the dated YYYYMMDD variant resolves via the existing suffix-strip path. Co-Authored-By: Claude <noreply@anthropic.com>

…ns on --reingest Enabling [capture] after a Claude Code session is already ingested never landed message content / tool_input on the existing rows: the idempotent INSERT skips on span_id conflict, and the --reingest path updated only sub_agent_id, leaving the attributes column untouched. So Metabuilder-Labs#4's recurring-inclusion detection (which reads that content) only worked against a fresh DB. The --reingest UPDATE now also overlays the freshly-parsed span's attributes over the stored ones (parsed wins per key) via a small _merge_attributes helper. This ADDS the content keys (gen_ai.prompt.content / completion.content / tool.input) without discarding any keys the stored row already carried (e.g. from live ingest). Capture-off reingest stays a no-op — the parsed attributes are just {"source": ...}, which the stored row already has — so it never wipes content a prior capture-on backfill stored. Two tests added: (1) ingest with capture OFF, enable capture, re-run with --reingest -> existing spans now carry content/tool_input, no fresh DB, no new rows; (2) a capture-off --reingest does not delete previously stored content. Co-Authored-By: Claude <noreply@anthropic.com>

…laude history The zero-install first run (`tj quickstart` / bare `tj`) backfills ~/.claude/projects/*.jsonl into a transient in-memory DB before printing the headline. Profiling a 3000-session synthetic history showed the backfill dominates (99.7s) while the diagnostic (0.45s) and timeline (0.04s) are trivial — the cost is the per-span SELECT+INSERT round-trips in `_insert_session_idempotent` (120k spans => 240k DuckDB statements). A first run over a large history blew well past the <30s time-to-first- value goal Metabuilder-Labs#6 targets. Add a most-recent-N-sessions cap to the first-run path: `iter_claude_code_ sessions` gains an optional `max_sessions` that walks files mtime-desc and stops once the cap is reached, bounding both parse and insert work regardless of how large ~/.claude is. `ingest_claude_code` threads it through and sets `BackfillResult.limit_reached`. `cmd_quickstart` caps at the most-recent 300 sessions by default (≈9s even on an 8000-session history), exposes `--full` to lift the cap, and discloses the truncation honestly ("Showing your most-recent N sessions … run tj quickstart --full or tj context for your full history") so it never reads as "this is everything". The full `tj backfill claude-code` path passes no cap and is byte-for-byte unchanged. Tests assert BOUNDED WORK (only N sessions/rows ingested on a large synthetic history, the cap keeps the freshest sessions, the limit flag and CLI disclosure render) rather than a flaky wall-clock number. Co-Authored-By: Claude <noreply@anthropic.com>

…ction beyond file reads Metabuilder-Labs#4's recurring-inclusion detection only covered file Reads (group by `file_path`, frequency-count across sessions, suggest `@file`/CLAUDE.md). Repeated prompts, repeated searches, and large repeated tool outputs also re-paste across turns/sessions and burn the same quota, but went undetected. Generalize the detector into one `(inclusion_type, tool_name, signature)` aggregation carrying a distinct-session set + occurrence count, then flag four kinds, each with a type-appropriate structural fix: * file reads (Read/View/Cat `file_path`) -> `@file` / CLAUDE.md; * searches (Grep/Glob/Search query/pattern) -> pin / capture the result; * prompts (identical `gen_ai.prompt.content` re-sent) -> save as a slash-command / CLAUDE.md note; * large tool outputs (identical `gen_ai.tool.output` >= 2K chars re-pasted) -> reference the artifact instead of re-running. Each kind is independently capture-gated by its `[capture]` toggle (tool_inputs / prompts / tool_outputs), so default-off behavior is unchanged and a flag-off kind contributes nothing. File reads / searches gate on distinct sessions (a single session re-reading a file isn't structural); prompts / outputs re-paste across turns within a session too, so they gate on raw occurrence count. Framing stays quota-native (Metabuilder-Labs#4). Tool outputs are only captured on the live ingest path, not the on-disk transcript — the nudge note says so. `RecurringInclusion` now carries `inclusion_type`; the CLI renders a per-kind tag and the JSON payload round-trips it. Extends the synthetic fixture to exercise recurring Grep, prompts, and large outputs, asserting each is detected with frequency + the structural fix, plus a size-gate test and a per-flag gating test. Co-Authored-By: Claude <noreply@anthropic.com>

…med overhead source `tj context` (Metabuilder-Labs#4) named re-read overhead (cache reads) but not the two specific large sources Metabuilder-Labs#11 asked for: MCP schema-injection and prompt- cache-miss tokens. This ships the derivable half and parks the other with a precise note, keeping the honesty discipline (Rule 14) — no invented attribution numbers. Cache-miss (DERIVABLE): cache-creation tokens (NormalizedSpan.cache_write_ tokens, from the on-disk usage block's cache_creation_input_tokens) are input that missed the cache and had to be written to it, billed by Anthropic at a premium. They were already summed into total_cache_write_ tokens and the composition denominator but never named as their own overhead category — so the re-read and net-new-work shares silently failed to sum to 100% whenever cache writes were present. Now surfaced as a distinct `cache_miss` category: total_cache_miss_tokens / cache_miss_share on the diagnostic, cache_miss_tokens / cache_miss_share per turn, a new "Cache-miss:" breakdown line in the card (shown only when non-zero so default output is unchanged), and corresponding JSON fields. MCP schema-injection (~25K tok/call) (PARKED): not derivable from current data. Neither the on-disk transcript nor live spans carry per-tool / per-schema token attribution — tool-definition tokens are folded into input/cache-creation, and `mcp__`-prefixed names appear only on tool- invocation spans (no schema-injection token count). Surfaced as MCP_INJECTION_PARK_NOTE (a diagnostic note + a JSON field) naming exactly what data would be needed (a per-request tool-schema token delta) rather than fabricating a figure. Tests: extend the synthetic fixture so session B pays a cache-creation premium; assert the named cache-miss category is computed (core + per-turn + JSON), rendered in the card, and that the MCP half is parked with a precise note. Default-off behavior unchanged. Full suite green (1207 passed), ruff + mypy clean. Co-Authored-By: Claude <noreply@anthropic.com>

…abuilder-Labs#14) The claude-haiku-4-5 entry in pricing/models.toml carried the retired Haiku 3.5 rates (input 0.80 / output 4.00 / cache_read 0.08 / cache_write 1.00 per MTok) — identical to the claude-haiku-3-5 row. The entry existed, so it did not trip the Metabuilder-Labs#8 "No pricing data" warning, but it silently understated the Haiku-4.5 API-plan dollar calibration line. Update to the current published Haiku 4.5 rates (input 1.00 / output 5.00 / cache_read 0.10 / cache_write 1.25 per MTok, 5-minute cache-write column), verified against platform.claude.com/docs/en/about-claude/pricing. Haiku 3.5 keeps its own old rates for the Bedrock/Vertex date-suffix fallback. Tests: add claude-haiku-4-5 to the Metabuilder-Labs#8 current-Anthropic coverage dict and a dedicated regression asserting it no longer carries the retired 3.5 figures (and that 3.5 still does). Update the dollar/rate assertions across test_cost.py, test_cost_tracking.py, test_provider_attribution_194.py, and test_cost_charts.py that were pinned to the old Haiku rate. Co-Authored-By: Claude <noreply@anthropic.com>

…etabuilder-Labs#15) The Claude Code backfill ran one existence-check SELECT plus one INSERT per span in `_insert_session_idempotent` — ~2 DuckDB round-trips per span, so a large history (~120k spans) paid ~240k statements (~100s). Metabuilder-Labs#13 bounded only the quickstart first-run path; the full `tj backfill claude-code` / daemon path still paid the full per-span cost. Replace the per-span loop with a bulk path that processes a whole session in a BOUNDED number of statements regardless of span count: - Partition new-vs-existing span_ids in ONE chunked `WHERE span_id IN (...)` query (`_existing_span_ids`) instead of N existence SELECTs. - Bulk-insert the new spans in a single `executemany` (`_SPAN_INSERT_SQL` + `_span_insert_params`, kept in lock-step with `DuckDBBackend.insert_span`). - On `--reingest`, batch-load the existing rows' attributes in ONE chunked query (`_load_attrs_bulk`), compute the per-key attribute merge in Python, then apply the updates in a single `executemany`. The Metabuilder-Labs#10 idempotency + reingest contract is preserved exactly: new spans insert; existing spans without `--reingest` are skipped untouched; existing spans with `--reingest` get `sub_agent_id` updated and `attributes` per-key merged (overlay `{**stored, **parsed}`, parsed wins, never wipes a stored key). The no-`conn` fallback path is unchanged. Drops the now-unused single-span `_load_attrs` helper. Tests: all existing Metabuilder-Labs#3/Metabuilder-Labs#10/Metabuilder-Labs#13 backfill tests pass unchanged. Adds four Metabuilder-Labs#15 tests — bulk insert lands every new span; existing spans are skipped without `--reingest`; `--reingest` still merges attributes (overlay, no wipe) + updates sub_agent_id; and a counting-connection test asserting a 400-span session inserts in a BOUNDED number of statements (well under N execute() calls, via executemany) rather than ~2 per span. Co-Authored-By: Claude <noreply@anthropic.com>

…lar (Metabuilder-Labs#17) Metabuilder-Labs#2 (commit f845571) added a `framing` block to /api/v1/sessions/{id} and claimed "no UI change needed," but two dollar-bearing cells in the served SPA still called bare `fmtCost(...)`. For a Max-subscription user (framing.display_rule = suppress_dollars_for_subscription_share) this leaked raw dollars: the session-detail "Cost & Tokens" / "Implied API value" panel rendered e.g. "$198.9709" (and "$0.0000" on empty sessions), and the Status "Archived sessions" table cost column showed raw "$0.0000". The panel only relabeled the header for subscription users while still emitting the raw figure — the exact honesty leak Metabuilder-Labs#191/Metabuilder-Labs#2 exist to prevent. Both cells now route through the existing `fmtFramedDollar(value, framing)` helper, matching how Traces/Cost/Optimize/Analytics render: subscription users see token-share ("% of cycle"), local sees "—", and only api/unknown plans see raw "$". The `framing` block is already in scope in both views — SessionDetailView reads `data.framing` off /sessions/{id}; the Status view reads `data.framing` off /status (already consumed by the agent cards' "Cost today" cell). No backend change needed. Guarded by two new static-grep regressions in tests/unit/test_lens_ui_regression.py asserting the bare-fmtCost patterns are gone and the framed forms are present (there is no JS runner in the Python CI job; the served source is the unit under test). Co-Authored-By: Claude <noreply@anthropic.com>

…ion aggregate The Status archived-sessions list and the /api/v1/sessions/{id} detail view read tokens/cost straight off the denormalized `sessions` aggregate columns. Those columns are accumulated per span keyed by `span.session_id` in `IngestPipeline._build_or_update_session`. That is correct when telemetry stamps the session id on its cost spans (the /v1/logs Claude Code/Codex path and the on-disk backfill both do). But a fan-out harness posting raw OTLP to /api/v1/spans emits the zero-cost `invoke_agent` marker span WITH `session.id` while its cost-bearing `gen_ai.llm.call` spans carry only `agent_id` + a shared `traceId` (otlp_parsing reads the optional `attrs.get("session.id")`). Those cost spans arrive with session_id=None, so `_resolve_session` mints a throwaway UUID for each and they never accumulate onto the marker's session row. The row stays 0 even though the spend is real and surfaces correctly on Cost / Analytics / Traces, which key by agent_id / trace and never by session_id. The defensible association is the trace: the marker span (which carries the session_id) and the cost spans share a `trace_id` — exactly the join /traces already uses to attribute the same spans. New `session_token_cost_rollup()` in core/db.py rolls up every span whose `session_id` matches OR whose `trace_id` appears on a span carrying this session_id. It only follows trace linkage that originates from a span already bearing the session_id, so it never pulls in unrelated traces and never resorts to a lossy agent_id+time-window heuristic that could cross-attribute concurrent sessions. When the cost spans DO carry the session_id (the common case) the trace clause is redundant and the result equals the plain session_id sum — a strict superset that never under- or over-counts the already-correct paths. Each span is counted once (a disjunction over rows, not a self-join), so no double-counting. Returns None when the session has no spans, and both callers fall back to the stored row in that case. Status `_build_archive` and the session-detail `session` block + framing now report the rolled-up figures, reconciling per-session rows with /api/v1/cost. Covered by three integration tests over a fixture that mirrors the harness shape (session + marker carry session_id+trace; cost spans carry only trace+agent_id): they assert the session-detail row, the Status archived row, and reconciliation with /api/v1/cost's agent total. Co-Authored-By: Claude <noreply@anthropic.com>

…not live agents The Overview/landing tab rendered the "No data yet — TokenJam is listening" onboarding empty-state whenever the LIVE-agents list (/status `agents`) was empty — even when the DB was full of historical/backfilled spans (344M tokens, $23K) that Cost/Analytics/Traces all showed. Since `/status` `agents` carries only currently-active session tiles, an all-historical DB (nothing streaming) read as "all my data is gone" on the front door — a false data-loss scare during dogfooding. Reserve the onboarding empty-state for a genuinely empty DB. The Overview now decides "empty" from whether ANY data exists rather than from the live-agents list: window totals from /cost (`total_tokens` / `total_cost_usd` / `rows` — the same load-bearing signal the other screens use) OR any historical/live session in /status (`agents` or `archived`). /status moves into the existing parallel fan-out with a non-fatal `.catch` (a failing /status must not blank the Overview), preserving the Metabuilder-Labs#124 asymmetric error-handling contract (/cost stays load-bearing with no catch). Frontend-only fix. Guarded by a static-grep regression test asserting the buggy live-agents-only gate is gone and the has-data/has-session gate is present, plus an API test proving a historical-only DB (closed session + cost spans, no live agents) reports `agents == []` but a non-empty `archived` and non-zero /cost totals — the exact signals the new gate keys off. Co-Authored-By: Claude <noreply@anthropic.com>

…der-Labs#20) The Traces tab initialized `traces` to an empty array and rendered the "No traces yet" empty-state immediately on first paint — before the /api/v1/traces fetch resolved. On a slow fetch this flashed for ~0.5–1s even when the API returned real rows, misreading as "no data." Add a `loaded` flag (mirroring the Cost view's `loading` pattern) that flips true only once the first fetch settles. While `!loaded`, render a loading shimmer; the "No traces yet" empty-state is now gated on `loaded` so it can only appear after a fetch genuinely returned zero rows. The Cost view already uses a `loading` flag + shimmer for its chart body, so it does not flash a fake empty table; the residual `$0.0000` in the chart-header total is a separate, narrower change and is left out of scope per the ticket. Guarded by a static-grep regression test asserting the Traces view distinguishes the loading state (shimmer) from the loaded-empty state. Co-Authored-By: Claude <noreply@anthropic.com>

Bring mainline (v0.5.2, +107 commits: proxy/policy, Analytics, pricing overrides, Dashboard UI, request capture Metabuilder-Labs#209, resume-dedup Metabuilder-Labs#294) into the session-status branch. Notable reconciliations: - db.py migrations: kept main's RELEASED migrations 6 (policy tables) and 7 (request capture) at their version numbers; renumbered the branch's session migrations to 8-13. Safe because run_migrations tracks applied versions as a set, and 6/7 already shipped to production DBs. insert_span now carries all 28 columns; kept the _trace_filter_where refactor + the close_* methods. - backfill.py: combined main's dedup-by-message.id (Metabuilder-Labs#294) with the branch's sub_agent_id + content-capture + cache-write tracking; kept the bulk-insert path. Totals computed from deduped spans, including cache_write. - UI (index.html): adopted main's Dashboard direction (Overview retired), grafting the branch's StatusView project-grouping/archived/clickable cards, the Traces loading shimmer (Metabuilder-Labs#20), and PORTING the Metabuilder-Labs#19 backfilled-DB empty-state fix into DashboardView. - Bare `tj`: adopted main's branded home screen (Metabuilder-Labs#240); `tj quickstart` stays the explicit zero-install first run. Onboard project name now defaults to the derived/--project value (no interactive prompt) to keep main's plan-first flow. All 1397 tests pass; ruff + mypy clean. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

anilmurty · 2026-06-25T14:29:53Z

look who's back! :) -- big change, will review thoroughly

anilmurty · 2026-06-25T14:53:23Z

@anshss this is seriously impressive, and it honestly made my week to see you back on this project.

I went through it properly, so a few specifics first. The lifecycle root cause is a great catch: the zero-duration invoke_agent span from Claude Code's logs export getting read as a completion, so every live session force-completed on its first prompt. That one's been quietly wrong on the Status page for a while and you nailed the diagnosis. The merge reconciliation was also really clean, keeping main's released migrations and renumbering yours to 8 through 13, folding in the recent message.id dedup, porting the empty-state fix. That made 15k lines a lot easier to follow.

Here's the honest lay of the land, and it's mostly about timing. You dropped off right as we were narrowing hard into cost-optimization, and the North Star moved while you were out: cost first, with observability as the substrate underneath rather than the headline. A lot of this PR is the session-timelines / agent-debugging direction you'd been building (the Timeline play-by-play, the Map, the subagent trees), which was exactly the right instinct for where we were back in May, and is genuinely beautiful work.

So rather than one big merge, I think the cleanest way to look at it is two lanes.

The cost lane, squarely where TJ is now, and I'd love all of it:

the lifecycle fix (it's also what makes per-session cost accurate)
sub-agent cost attribution plus the right-sizing analyzer
the cost/pricing correctness (cache-creation split, current rates, sessions framing, bulk-insert perf)
tj context, the tokenmaxx quota card, the Opus quota audit

I'd love to land the lifecycle fix first and on its own since it's a real bug and deserves to ship fast, then the rest of the cost pieces as focused follow-ups.

The observability lane (aka old OCW focus), which is more of a conversation than a merge:

the session-detail Timeline, the Map, run grouping, the session-status tiers
not because it isn't great (it is), but because pure "what did my agent do" debugging is the crowded category we deliberately stepped back from (and chose to work with existing solutions rather than build our own). The interesting part is your Map already fuses cost into the activity view ($110, "1 flagged for right-sizing"), and that version, where it's really answering "where did the spend go and what's over-provisioned," is very much on-mission. So it's less "drop this" and more "let's figure out together which pieces we point at waste and spend versus pure activity."

Can we jam on that part? I'll DM you too. Short version: the cost pieces I want now, lifecycle first, and the timeline/map work I want to think through with you so we aim it at where TJ is headed rather than where it was.

Either way, really glad to have you back on this.

anilmurty

documented in previous comment. After our discussion we can keep all the feature changes but ideally should split into separate PRs to limit risk

anshss and others added 30 commits May 27, 2026 10:15

fix(ui): place restore-session button top-right on the session headin…

67db92c

…g row Move the restore button onto the title row, right-aligned next to the session heading, instead of stacked under the Session id line. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

anshss and others added 25 commits June 11, 2026 15:40

anshss requested a review from anilmurty as a code owner June 25, 2026 13:08

Merge branch 'main' into fix/claude-code-session-status

0906e8a

anilmurty requested changes Jun 28, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Claude Code session-status dashboard — lifecycle fix, session detail/timeline/map, run grouping, quickstart#306

feat: Claude Code session-status dashboard — lifecycle fix, session detail/timeline/map, run grouping, quickstart#306
anshss wants to merge 74 commits into
Metabuilder-Labs:mainfrom
anshss:fix/claude-code-session-status

anshss commented Jun 25, 2026

Uh oh!

anilmurty commented Jun 25, 2026

Uh oh!

anilmurty commented Jun 25, 2026

Uh oh!

anilmurty left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

anshss commented Jun 25, 2026

Summary

Merge with main (v0.5.2)

Tests / Verification

What's NOT in this PR

Uh oh!

anilmurty commented Jun 25, 2026

Uh oh!

anilmurty commented Jun 25, 2026

Uh oh!

anilmurty left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Merge with `main` (v0.5.2)