Skip to content

feat: Claude Code session-status dashboard — lifecycle fix, session detail/timeline/map, run grouping, quickstart#306

Open
anshss wants to merge 74 commits into
Metabuilder-Labs:mainfrom
anshss:fix/claude-code-session-status
Open

feat: Claude Code session-status dashboard — lifecycle fix, session detail/timeline/map, run grouping, quickstart#306
anshss wants to merge 74 commits into
Metabuilder-Labs:mainfrom
anshss:fix/claude-code-session-status

Conversation

@anshss

@anshss anshss commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

Turns the Status page from a flat agent list into a real session-status dashboard for Claude Code — fixing the root bug where live sessions showed as "completed", then building the drill-in surfaces (detail / timeline / map), cross-session run grouping, and a zero-install first run. Rebased onto and merged with current main (v0.5.2).

Summary

  • Root fix: Claude Code/Codex export OTLP logs (not traces); each user_prompt becomes a zero-duration invoke_agent span that ingest mistook for a session completion — so every live session force-completed on its first prompt. Completion is now gated on real duration (end_time > start_time); non-end spans re-activate mistakenly-completed sessions.
  • Session lifecycle: active / idle / stale / closed tiers (SessionRecord.status_at), close-signal endpoint + tj session-end, archive view, per-terminal tiles, display labels (service.instance.id + manual overrides), and a claude shell wrapper for per-terminal naming.
  • Project grouping: dashboard tiles roll up by OTel service.namespace, with a server-side [agents.<id>].project fallback so already-running sessions group without a restart.
  • Session drill-in: a per-session detail view with model-mix + context-growth, a deterministic Timeline (transcript play-by-play incl. recursive subagent logs, no LLM), and a graphical Map of what the agent did per ask.
  • Cross-session Run grouping: a fan-out harness stamps tokenjam.run_id / parent_session_id; tj groups spawned workers into one Run (/api/v1/runs), plus a setup_harness MCP tool + integration doc.
  • Sub-agent cost attribution: sub_agent_id on spans, per-subagent cost breakdown, and a subagent right-sizing analyzer.
  • Zero-install first run + Claude-Code wedge: tj quickstart (npx tj quickstart / uvx), tj context cost diagnostic, tj tokenmaxx quota card, retroactive Opus quota audit, branded home screen on bare tj.
  • Cost/pricing correctness: capture cache-creation tokens separately from reads, current Anthropic/Haiku-4.5 rates, framing block on /api/v1/sessions/{id}, bulk-insert backfill (perf).

Merge with main (v0.5.2)

This branch diverged at v0.5.0 and now merges +107 mainline commits (proxy/policy, Analytics, pricing overrides, Dashboard UI, request-capture #209, resume-dedup #294). Notable reconciliations:

Tests / Verification

  • Full suite green: 1397 passed (pytest tests/unit tests/synthetic tests/agents tests/integration).
  • ruff check + mypy clean across touched files.
  • UI guarded by static-grep regression tests (test_lens_ui_regression.py) + offline test (test_ui_offline.py); module JS validated with node --check.
  • Diff scope: 88 files, +15,552 / −648 vs main.

What's NOT in this PR

  • Live OTel autopsy of subagent trees — CC's live OTLP export is a flat 2-level tree; deep subagent reconstruction is deterministic from the on-disk JSONL only (Timeline path), not the live span path.
  • LLM-generated intent summaries — the Timeline/Map are deterministic (no generation) by design.
  • Active-time (idle-segmented) durations — tiles still show wall-clock; idle-gap segmentation is deferred.
  • Drift on logs-path sessions — needs an idle-sweep to fire session-end hooks; deferred.

🤖 Generated with Claude Code

anshss and others added 30 commits May 27, 2026 10:15
…sions

Claude Code / Codex export telemetry as OTLP logs. routes/logs.py maps each
user_prompt event to a zero-duration invoke_agent span (end_time == start_time)
that marks the START of a turn. ingest.py treated any invoke_agent span with a
truthy end_time as a session completion, so every live session was force-
completed on its first prompt and the drift/alert session-end hooks fired on
every turn.

Result on the dashboard: an actively-running session displayed as "completed"
with 0 duration and 0 tokens (the Status tile surfaced whichever empty marker
session was newest), while real cost only showed in the daily roll-up.

Fix:
- ingest._is_session_end(): only treat an invoke_agent span as a completion
  when it has a real duration (end_time > start_time). The SDK @watch() path
  still completes correctly; the logs-path turn markers no longer do.
- Ongoing-activity spans re-activate a session left "completed" (self-heals
  in-flight sessions). Idle sessions are surfaced as "stale" at read time via
  SessionRecord.effective_status (existing SESSION_STALE_THRESHOLD).
- status route: pick the active session by most-recent activity
  (COALESCE(ended_at, started_at) DESC) and report effective_status, so a live
  session wins over a freshly-spawned empty marker.

Tests: factory make_invoke_agent_span(); 4 lifecycle regression tests +
1 DuckDB-backed status-route test proving the live session reads as active.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The Status dashboard only showed per-repo agent tiles (claude-code,
claude-code-harness, claude-code-godmode...), because tj names each agent
after the git repo (cmd_onboard._derive_project_name). A user working across
one org's repos saw their work scattered across many tiles with no project
identity.

Add OTel service.namespace as a "project" grouping dimension:

- onboard: derive the git org (_derive_org_name) and write
  service.namespace alongside service.name into the project's
  .claude/settings.json. New --project flag overrides it (e.g. org
  "aquanodeio" -> "aquanode").
- ingest: capture service.namespace from resource attrs on every path —
  logs (Claude Code / Codex), live OTLP spans, and the in-process SDK —
  and persist it on the SessionRecord (late-resolved like plan_tier).
- db: migration 5 adds sessions.service_namespace (nullable); upsert +
  _row_to_session round-trip it.
- status API: each agent now reports its namespace.
- dashboard: tiles group under a project header showing agent count and
  rolled-up cost; agents with no namespace fall under "Ungrouped".

Net effect: every Aquanodeio/* repo rolls up under one "aquanode" project
tile, no per-repo config needed.

Tests: factory service_namespace support; ingest capture + late-resolve;
logs-path resource extraction; status-route end-to-end; org-name parsing
(https/ssh); migration count. Dashboard grouping verified in-browser.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Two dashboard-accuracy fixes:

1. Duration: the Status tile picked the "latest completed" session by
   started_at, so a 40s fragment that started 3 min after the real session
   hid a 4.5-hour session that stayed active afterwards. Order
   get_completed_sessions by last activity (COALESCE(ended_at, started_at)
   DESC) instead. Fixes the tile across the status route, MCP, CLI status,
   and metrics — all of which take the limit=1 "current session".

2. Onboard now prompts for a project name (OTel service.namespace) with the
   git repo name as the default, instead of silently deriving the git org.
   A meta-repo (git repo "harness" holding all of "aquanode") can be named
   "aquanode" at onboard time. --project still skips the prompt for
   non-interactive use. Removes the now-unused _derive_org_name.

Tests: ordering regression (long session vs later-started fragment);
make_session gains started_at/ended_at overrides; onboard --project and
interactive project-name prompt; existing onboard CLI tests pass --project.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
service.namespace rides in OTEL_RESOURCE_ATTRIBUTES, which an agent reads at
startup — so an already-running Claude Code session can never start sending it
without a restart. Add a server-side fallback: [agents.<id>].project maps an
agent to a dashboard project, applied by tj regardless of what the wire carries.

- config: AgentConfig.project (round-trips via _dc_to_dict / loader).
- ingest: session.service_namespace falls back to the agent's configured
  project when the span carries none (so running sessions self-heal on their
  next span); an explicit wire namespace still wins.
- status route: namespace falls back to the configured project at query time,
  so even already-completed sessions with NULL namespace group immediately —
  no backfill, no restart.
- onboard: writes the chosen project into [agents.<id>].project too, so both
  the wire (future sessions) and server-side (running sessions) paths are set.

Verified end-to-end: a session sending only service.name=claude-code-harness
(no namespace) reports namespace=aquanode and its true 4h+ duration after just
a daemon restart.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…inals)

Several Claude Code terminals in one repo share a service.name, so they all
report under one agent_id. The Status view collapsed an agent to a single
representative session, hiding concurrent terminals.

Now the status route emits one tile per recently-active session (last span
within SESSION_STALE_THRESHOLD), falling back to the latest completed session
when none are active. Each tile carries its own session_id, tokens, tool
calls, duration, last-seen, per-session cost, and per-session alert count
(when an agent has multiple). The dashboard renders a session-id line per tile
and counts "N sessions" per project group (cost deduped by agent).

Tests: concurrent-sessions-under-one-agent yields one tile each; updated the
live-vs-marker test (both now show as their own tile).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Concurrent terminals show as opaque session-id tiles. Give them names two
ways, resolved in priority order on the status route:
1. [session_labels] config override (full id or prefix match) — names an
   already-running terminal immediately, no relaunch.
2. OTel service.instance.id — durable per-terminal name set at launch via
   OTEL_RESOURCE_ATTRIBUTES; captured like service.namespace.
3. else None (UI falls back to the short session id).

- config: TjConfig.session_labels dict ([session_labels] table).
- service.instance.id captured across logs / OTLP / SDK paths onto
  SessionRecord.service_instance_id (migration 6); late-resolved.
- status route: _session_label() resolves the name; each tile gets "label".
- dashboard: tile header shows the label when present (agent id + short
  session id move to detail rows), else the agent id as before.

Tests: instance.id capture; label priority (config > instance.id > none);
session_labels round-trip; migration count 5->6.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…rd wrapper

Add a `claude` shell wrapper (installed by `tj onboard --claude-code`) so each
terminal tags its session with a distinct `service.instance.id`, rendering as
separate dashboard tiles — without hand-editing shell rc and preserving the
project's `service.name` / `service.namespace`.

- New `tj otel-resource-attrs` command (cmd_otel.py): prints the project's OTel
  resource attrs on a single bare line (`service.name=claude-code-<repo>` plus
  `service.namespace=<project>` when the agent has a project configured).
  Registered in main.py and added to `no_db_commands`.
- `tj onboard --claude-code` installs an idempotent `claude()` function into
  `~/.zshrc` (always) and `~/.bashrc` (when present), behind begin/end markers
  so re-onboards replace in place. The wrapper consumes `--as <name>`, derives
  the instance id (--as value, else tty basename, else `unknown`), exports
  `OTEL_RESOURCE_ATTRIBUTES`, and runs `command claude` to avoid recursion.
  Portable across zsh and bash.
- Tests for both the new command (with/without configured project) and the
  wrapper install (presence, idempotency, bashrc-only-when-present).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…rchive

Claude Code emits no "session closed" event, so tj can only see
time-since-last-span. Model the lifecycle explicitly and let the `claude`
wrapper report closes:

- models: effective_status gains idle/closed tiers (active <5m, idle <4h,
  stale beyond); status_at(idle_threshold) keeps the property pure while the
  route honours configurable [sessions] idle_minutes (default 240).
- config: TjConfig.session_idle_minutes round-trips via the [sessions] table.
- db: close_sessions_by_instance / close_session_by_id (idempotent, only
  flips status='active' rows, bumps ended_at).
- api: POST /api/v1/sessions/close (ingest-Bearer-protected) marks a
  terminal's active sessions closed; returns {"closed": n}.
- cli: `tj session-end --instance/--session` POSTs best-effort over HTTP
  (no DB), exits 0 silently when the daemon is down.
- wrapper: claude() reports session-end on return and via INT/TERM/HUP trap,
  preserving claude's exit status. Idempotent close makes double-fire safe.
- onboard: stop writing OTEL_RESOURCE_ATTRIBUTES to project settings.json
  (the wrapper owns it per-terminal; settings env would override it and
  collapse tiles) and delete any pre-existing one to migrate older setups.
- status route: only active+idle sessions become tiles (no completed/closed
  fallback), capped at 6 per agent with a surfaced overflow count; new
  `archived` list returns closed+stale sessions (cap 50).
- ui: idle (amber) + closed styles, an Archived sessions table, and a
  "+N more" affordance for capped project groups.

Gates: ruff clean, mypy clean, 639 passed.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Clicking a Status tile navigated to the global `#/traces` firehose, dropping
the clicked session's identity and landing the user on an unfiltered,
all-agents trace list. The header promised "details" but zoomed out. Add a
real session-scoped destination instead.

- GET /api/v1/sessions/{session_id}: per-session rollup (cost/tokens/tools/
  alerts/drift) plus the session's own traces, so the UI can drill into the
  existing waterfall. Read-only; require_api_key; 404 as JSONResponse with
  response_model=None. Parameterised db.conn SQL only.
- UI SessionDetailView + router case + click fix (index.html:680 now routes to
  `#/sessions/<id>`, archived rows clickable). Plan-tier-honest cost framing:
  "Implied API value" for subscription, "Local model — no API cost" for local,
  real cost for API; no invented spend, no fabricated agent tree (the live
  Claude Code telemetry is flat — documented in the route).
- Stop tracking .tj/config.toml: every `tj` run from the repo cwd rewrites it
  and rotates the committed ingest_secret. The `.tj/` ignore rule already
  exists; this just drops the stale tracked copy.

Tests: +5 integration tests (rollup/tools/traces, unknown→404, subscription
plan_tier→pricing_mode, drift baseline, api-key auth). ruff + mypy clean,
644 pass.

Layer 1 of the run-autopsy arc; subagent-tree capture at ingest (Layer 2) is
the next step.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…view

Layer 2 of the run-autopsy arc. The session detail view now shows how a
session split across models and how its context grew over the run — both
derived from existing gen_ai.llm.call spans, so it works on every session
(live + backfilled) with no schema change.

- GET /api/v1/sessions/{session_id} gains turn_count, model_mix (per-model
  calls/tokens/cost rollup) and context_series (time-ordered input-token
  series, downsampled to <=120 points, first+last preserved). Parameterised
  db.conn SQL; span name from the GenAIAttributes semconv constant.
- UI "Models & context" section: model-mix table + a dependency-free CSS bar
  chart of input (context) tokens per LLM call. Descriptive only — no model-
  routing-quality claims; subscription caveat reused on cost.

Tests: +3 integration tests (model_mix aggregation/order, turn_count,
context_series ordering + downsample cap). ruff + mypy clean, 647 pass.

Deferred (recorded in .plans): parentUuid within-session branch tree; the
cross-session spawn graph (needs a harness-emitted spawn marker — no clean
parent->child link exists in on-disk Claude Code data).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…unit

Layer 3 of the run-autopsy arc. A fan-out harness (e.g. the meta-repo governor)
stamps tokenjam.run_id (and optional tokenjam.parent_session_id) as OTel resource
attributes on each worker session it spawns; tj groups those sessions into one
Run. Linkage is DECLARED by the spawner — Claude Code OTLP carries no native
parent<->child edge, so it is never reverse-engineered.

- semconv: TjAttributes.RUN_ID / PARENT_SESSION_ID.
- SessionRecord gains run_id + parent_session_id; migration 7 adds the columns
  (ADD COLUMN IF NOT EXISTS — fresh-DB and upgrade safe).
- Ingest captures the markers from resource attributes on both paths
  (otel/otlp_parsing.py for the spans/OTLP path, api/routes/logs.py for the
  Claude Code logs path) and self-heals null-on-update (never overwrites).
- API: run_id/parent_session_id on GET /api/v1/sessions/{id}; new
  GET /api/v1/runs/{run_id} (totals + member sessions + parent-edge tree) and a
  GET /api/v1/runs index. 404 as JSONResponse + response_model=None.
- UI: RunDetailView (#/runs/<id>) — run rollup + sessions indented by spawn
  parent; "Run" link on the session detail. Plan-tier-honest cost framing
  (mixed -> implied API value, never a hard spend claim).

Tests: +7 integration tests (logs+spans ingest capture, session-detail
exposure, run grouping/aggregation/tree, unknown->404). ruff + mypy clean,
654 pass.

Harness side is a documented contract (set the resource attrs at spawn),
applied separately in the user's governor.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
L3 added tokenjam.run_id / tokenjam.parent_session_id resource-attribute capture
to the HTTP/OTLP (otlp_parsing.py) and Claude Code logs (logs.py) paths but missed
convert_otel_span — so a Python SDK app using the in-process exporter would never
join a Run. Extract the markers there too, mirroring the existing service.namespace
/ service.instance.id extraction, so run grouping works across all three ingest
paths (CC logs, HTTP/OTLP incl. the TS SDK, and the in-process Python SDK).

+2 unit tests for convert_otel_span resource extraction (also the first direct
coverage of that block). ruff + mypy clean, 656 pass.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
A "Story" section in the session view that explains, step by step, what a Claude
Code session was trying to do and how it went — surfacing the agent's own
narration threaded with its literal tool calls and ok/error outcomes. No LLM, no
generation: read live from the on-disk CC JSONL transcript, nothing stored in the
DB (capture posture unchanged).

- core/transcript.py: build_session_story() locates
  ~/.claude/projects/*/<session_id>.jsonl (session_id == transcript filename,
  verified 100% across cli + sdk-cli), parses task / steps (narration + per-tool
  {name, label, status} + is_error/is_retry flags) / outcome. Caps + truncation;
  NEVER returns full tool inputs/outputs — only a short arg label + ok/error
  (privacy + bounded payload).
- GET /api/v1/sessions/{id}/story (require_api_key, response_model=None);
  {available: false} at HTTP 200 for SDK/no-transcript sessions. Projects root
  overridable via app.state / TJ_CLAUDE_PROJECTS_ROOT for tests.
- UI Story section: Task callout, step list (expandable narration, tool chips,
  error tint, ↻ retry marker, omitted markers), Outcome callout. Dependency-free.

CC-only by design (SDK sessions have no CC JSONL → graceful unavailable state).
+15 tests (unit parser + API available/unavailable). ruff + mypy clean, 671 pass.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…Alerts / Traces)

The session card stacked six sections vertically — too much scroll. Split the
lower content into tabs while pinning the header + Overview/Cost summary cards at
top. Story is the default tab; Tools folds under "Models & context", Behavioral
drift under "Alerts". Pure layout change (htm/Preact + CSS), no backend or
data-flow change; data still loads as before, only display is gated by the active tab.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
StatusView's empty-state guard only checked data.agents (active + idle), so once
all sessions aged past the 4h idle window into `archived`, the dashboard showed
the cold-start "No agents found / pip install" screen despite having 50 archived
sessions to display. Guard now also requires `archived` to be empty before
showing the cold-start state, so the (clickable) archived-sessions table renders
whenever there is any history.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Each Agent/Task step in a session's Story now expands into that subagent's own
story (task -> steps -> outcome), recursively, so a session's full log includes
the work of everything it spawned. Subagent transcripts are read from
~/.claude/projects/<proj>/<session_id>/subagents/agent-<agentId>.jsonl, linked by
the agentId carried in the parent step's tool_result (exact match, no heuristic).

- core/transcript.py: include_subagents (default true) resolves each Agent/Task
  step's child agentId from its tool_result, loads agent-<id>.jsonl from the root
  session's subagents/ dir, and nests its story under the step (step.subagent),
  recursively. Guards: MAX_SUBAGENT_DEPTH, a shared step-budget across the tree,
  and an agentId cycle-set; depth/budget caps are surfaced, never silent-dropped.
  Privacy unchanged at every depth (narration + short tool label + ok/error only).
- GET /sessions/{id}/story: Agent steps carry a recursive `subagent`; ?subagents=false
  returns the flat single-session story.
- UI: collapsed "> subagent: <name> - N steps" disclosure under each Agent step;
  expands to the subagent's task/steps/outcome indented, recursively (its own Agent
  steps expand too).

+9 tests (parent->child->grandchild nesting, agentId resolution, depth/budget caps,
cycle guard, ?subagents=false). ruff + mypy clean, 680 pass. Real-data: this session
nests 12/12 spawned subagents.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The session activity section was a raw wall of full-card steps (a 356-step session
rendered ~32k px tall). Make it scannable and address the naming/order feedback.

- Rename the user-facing tab + section "Story" -> "Timeline" (the /story endpoint
  path is unchanged/internal).
- Newest-first display: renderStepsNewestFirst() reverses the step list (top level
  AND nested subagents) while keeping the #n labels (so Metabuilder-Labs#1 is still the first action).
  The Task callout stays pinned at top and Outcome at bottom; only the steps reverse.
- Compact rows: each step is now a one-line row (#n - time - tool chips ok/error -
  clamped first line of narration); click to expand the full narration + detail. The
  repetitive per-row model label is dropped and shown only when the model changes
  (prevModel). Error tint, retry markers, and the recursive subagent disclosure are
  preserved.

UI-only (tokenjam/ui/index.html); ruff clean.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The session-detail Timeline (StorySection) fetched /sessions/:id/story once
on mount with no polling, unlike every other view. A live session's Timeline
froze at page-load and only refreshed on remount (tab switch / re-navigate).
Add setInterval(load, 10000) matching the house idiom, preserving the
last-good Timeline on transient poll failures. Apply the same polling fix to
TraceDetailView, which had the identical once-only fetch (selection is by
span_id, so re-fetching spans preserves it).

Also remove the pinned top-level Task and Outcome callouts: in a long-running
session the first prompt goes stale as new prompts are sent, and Outcome was
just the last assistant message (already the top step) mislabeled as a final
outcome on a live session. The steps list already shows everything in
descending time order. Per-subagent Task/Outcome callouts stay (scoped, not
stale).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Each Timeline step's text was hard-trimmed to 400 chars server-side, with the
full narration discarded. The UI's expand-a-step feature could therefore only
ever reveal 400 chars ending in "…" — "show more" was lying, since the rest was
never sent. Raise MAX_STEP_TEXT_CHARS to 100K so it acts as a safety guard
against a pathological single blob rather than a preview trim; the UI already
shows only the first line collapsed and the full text when expanded. Real
assistant responses now render complete.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
ended_at is a session's last-activity time, shown as "Last seen" in the UI.
close_session(s) advanced it to the close moment (CASE ... ended_at < now),
so a session closed days after its last span reported "Last seen 2h ago" and a
multi-day duration, while its actual last telemetry/transcript message was days
earlier. Closing is an admin action, not telemetry.

- Both close methods now use ended_at = COALESCE(ended_at, now): stamp only
  when NULL, never advance a real last-span time.
- Migration 8 repairs already-closed rows: recompute ended_at from each closed
  session's max span time, lowering only (idempotent, leaves consistent rows).

Tests: close preserves ended_at / stamps when NULL (integration); backfill
lowers bumped rows, leaves consistent rows, ignores active sessions (unit).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Two status-page UX fixes:

- Scroll memory: returning from a session/run detail page (e.g. clicking an
  archived session, then Back) landed at the top of the Status page instead of
  where you were. The window scrolls (.main#app has no own overflow) and the
  async list load defeats native scroll restoration. Add a useScrollMemory hook
  that saves window scroll per view and restores it once the list has rendered.

- "restore session" button on every archived (closed/stale) and idle session.
  Clicking copies `claude --resume <session-id>` to the clipboard so the
  session can be picked back up in the terminal. Copy uses execCommand inside
  the click gesture (reliable even when the async Clipboard API hangs on a
  permission prompt) with a best-effort navigator.clipboard upgrade, and
  stopPropagation so it doesn't trigger the row/tile navigation.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The "restore session" button belongs on the session detail page (where you're
looking at one archived/idle session), not on the Status list. Move it: render
it under the session title in SessionDetailView when the session is not active
(closed / stale / idle), and remove it from the Status page archived table and
idle tiles.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…g row

Move the restore button onto the title row, right-aligned next to the session
heading, instead of stacked under the Session id line.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Add a copy glyph to the restore-session button and a styled on-hover tooltip
that tells you to paste & run the command in your terminal, showing the exact
`claude --resume <id>` line. Replaces the plain native title tooltip.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude Code (and the OTel/SDK paths) emit cache_creation_tokens, but every
ingest parser dropped them — only cache_read landed in the single cache_tokens
field. The dashboard's "Cache tokens" therefore understated total cache usage
by the cache-write amount, and cache-write tokens were never priced.

Add a first-class cache_creation_tokens field through the stack while keeping
cache_tokens = cache reads (analyzers price it at the cache-read rate):

- models: cache_creation_tokens on NormalizedSpan + SessionRecord
- db: migration 9 adds the column to spans + sessions; wired through
  insert/upsert/row-mappers
- ingest: aggregate cache_creation_tokens per session
- cost: CostEngine now prices cache writes at the cache-write rate
- ingest paths: logs.py (Claude Code), otlp_parsing.py, provider.py; backfill
  realigned to the same convention (was the lone outlier)
- api/dashboard: expose cache_creation_tokens (sessions/runs/traces);
  "Cache tokens" now renders reads + writes

Tests: +2 regression tests; updated tests asserting the old behavior.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Reconcile the branch's session-detail / dashboard / transcript / run-grouping
work with 98 commits from main. Notable conflict resolutions:

- Cache read/write split: main shipped the same fix as `cache_write_tokens`
  while this branch used `cache_creation_tokens`. Converged on main's
  `cache_write_tokens` everywhere (models, db, ingest, cost, logs, otlp,
  provider, backfill, api routes, ui). Kept the branch's session-level
  aggregation + dashboard display (main only tracked it at span level).
- Migration number collision: both branches added migration 5+. Kept main's
  released migration 5 (cache_write on spans), renumbered the branch's to
  6-9, and added migration 10 for session-level cache_write (with a defensive
  spans IF NOT EXISTS to repair DBs that consumed v5 as the namespace column).
- backfill: kept the branch's read/write split (more correct than main, which
  still summed both into cache_tokens), under the cache_write_tokens name.
- dashboard: took main's offline-vendored preact/uplot imports + optimize
  route + params plumbing; kept the branch's StatusView (project grouping,
  archived sessions, click-to-detail), SessionDetailView, and run routes.
  "Cache tokens" now shows reads + writes.
- tests: unioned test_cli additions from both sides; kept the branch's
  dynamic migration-count assertions in test_db.

Verified: 836 tests pass, ruff + mypy clean, dashboard loads headless with no
console errors and renders the correct cache-token total.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude Code records each Task-tool subagent's turns in
<session>/subagents/agent-<id>.jsonl, tagged with the parent's sessionId.
Backfill folded those spans under the parent session but discarded the
subagent identity, so a session's cost could not be broken down per
subagent -- yet a single research run can spawn 100+ subagents that drive
most of the spend (verified on a real session: 66% of $642 across ~147
subagents, previously invisible inside one parent total).

- NormalizedSpan.sub_agent_id + spans.sub_agent_id column (migration 11)
- backfill sets it from a record's top-level agentId when isSidechain is
  true; None on the main thread
- both span write paths (db.insert_span, backfill) + _row_to_span carry it
- make_llm_span factory gains a sub_agent_id arg

Enables GROUP BY sub_agent_id for per-subagent breakdown / right-sizing.

Known limitation (separate latent bug, not addressed here): backfill
upserts the session row once per file with replace semantics, so
sessions.total_cost_usd reflects only the last-processed subagent file.
Span-derived cost (get_session_cost / get_cost_summary) stays correct.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
A Claude Code session is split across files sharing one session_id (main
thread + subagents/agent-*.jsonl). Backfill upserts the session row once
per file, and upsert_session uses replace semantics, so the row ended up
holding only the last-processed subagent file's totals instead of the sum
-- sessions.total_cost_usd and token counts were wrong for any multi-agent
session.

After ingesting, reconcile each touched session row's token + cost
aggregates to SUM over its spans (the source of truth), via the new
DuckDBBackend.recompute_session_totals_from_spans. Scoped to the sessions
backfill actually touched, so live-ingested rows are untouched. Idempotent:
re-running backfill also repairs rows written by an earlier (pre-fix) run.

Span-derived cost (get_session_cost / get_cost_summary) was already
correct; this fixes the stored sessions.* aggregates.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
New optimize analyzer that breaks a window's cost down per subagent
(sub_agent_id) and flags structural right-sizing candidates:
  - over_powered:     premium (Opus-tier) model, little output, few tool calls
  - over_provisioned: large context (input + cache) but little output

Honesty discipline (CLAUDE.md Rule 14): candidate flags only, never a
quality claim; the caveat is surfaced verbatim and the recoverable estimate
is left None (we report the spend concentrated in flagged subagents, not a
guaranteed saving).

Registered as "subagent" (auto-discovered; appended to ANALYZER_ORDER), so
it flows through get_optimize_report (MCP) and /api/v1/optimize via a dict
round-trip constructor, with a pricing-mode-aware CLI renderer.

Verified on a real session: 147 subagents = 66% of a $642 window, of which
25 are flagged ($109) -- Opus subagents fed 600K-1M cache tokens that
produced <1K output (over_provisioned).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Backfill is idempotent by span_id, so a plain re-run skips spans already in
the DB and never populates sub_agent_id on history ingested before that
column existed. --reingest UPDATEs the existing spans in place (sub_agent_id
refreshed) instead of skipping them -- no new rows, no duplicates -- so
accumulated history becomes attributable per subagent. Surfaced via the new
BackfillResult.spans_retagged counter and the CLI summary.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
anshss and others added 25 commits June 11, 2026 15:40
Make run linkage work for any fan-out harness, not just ones that announce a run
id. A harness groups its sessions by stamping the OTel resource attribute
tokenjam.run_id on the launcher + every worker; tj then groups them
automatically.

- core/harness_setup.py: generates the drop-in run-env helper (mints one run id
  per launch, exports tokenjam.run_id), the Python equivalent, and a bounded,
  prioritized scan for a repo's spawn points (harness-like files first so a big
  monorepo doesn't exhaust the budget; relative-path ranking; narrow regex to
  avoid false-positive floods).
- mcp/server.py: setup_harness(mode="instrument"|"map") — instrument writes the
  helper + reports spawn points and exact wiring for Claude to apply (never
  blind-edits harness code); map makes no changes and reports how tj already
  groups the harness's sessions. Verified against the real aquanode governor.
- docs/harness-integration.md: the three-tier contract (automatic Task subagents
  / tokenjam.run_id convention / best-effort inference) + the honest boundary;
  linked from the README.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_017qQKa76DCYkF68NfeNXtPw
…ration)"

A pure tool turn (the model fired a tool with no text that turn) rendered as a
dim "(no narration)" — you had to expand the row to see what it did. Surface the
tool's label/command inline instead (Bash command, Read/Edit file path, etc.),
monospace, for any tool. The inline chips still carry the tool name + ok/error
status; expand still shows the untruncated detail. Faithful: narration, when
present, still wins.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_017qQKa76DCYkF68NfeNXtPw
A failed tool step was boxed in red but gave no reason — you had to open the
transcript to learn why. Capture the tool_result's error text (wrapper-stripped,
length-capped) on the step and surface it in the expanded body. Remove the red
border around the row; the ✗ chip plus the expanded error block carry the signal.
Descriptive only — the message is the transcript's verbatim text.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_017qQKa76DCYkF68NfeNXtPw
Pressing "Distill titles" worked but felt broken: it reverted on reload, gave
weak feedback during the ~10s call, and showed "couldn't reach claude" even when
it simply had nothing to distill.

- Auto-apply on load: a cache-only peek (core.distill.peek_cached_titles, never
  calls claude) re-applies an already-distilled session, so a press sticks across
  reloads/navigation at zero cost. The /distill route gains ?cached_only.
- Honest note: the route returns candidate_count so the UI tells "nothing long
  enough to distill" (success) apart from "claude unreachable" (failure); button
  shows Distill titles / Distilling… / Distilled ✓.
- Feedback: a brief highlight flash on the headlines when a distill applies.

Verified live: cached titles auto-apply on load with no claude call; manual
distill flashes + updates the note.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_017qQKa76DCYkF68NfeNXtPw
…ration)

11 days of Claude Code use produced 14,846 alerts, 96.6% of them false:
retry_loop fired on normal repeated tool use because it keyed on tool NAME only
(4 different Bash/Read/WebSearch calls in 6 spans), counted non-execution
tool_decision event spans, and CC OTLP carries no tool arguments to tell an
identical repeat apart. drift treated heterogeneous interactive sessions as one
baseline (every longer answer "drifts"); session_duration flagged hour-long
governor runs against a 3600s SDK-era default.

- retry_loop now fires only on a GENUINE loop: same tool + identical arguments,
  counting only gen_ai.tool.call spans. A privacy-safe argument signature
  (tokenjam.tool_arg_sig) is computed at ingest before the raw tool input is
  stripped, so it works under any capture config; telemetry with no args (CC
  over OTLP) yields no signature and never trips this. Genuine CC retries stay
  visible in the Map/Timeline (transcript-derived is_retry).
- drift + session_duration are skipped for interactive coding agents
  (claude-code* / codex*) where they're structurally meaningless; fully active
  for SDK / production agents with stable workloads.
- shared utils/signatures.tool_arg_signature; make_tool_span gains tool_input/
  name; the retry-loop demo models a real identical-args loop.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_017qQKa76DCYkF68NfeNXtPw
failure_rate detection is correct (it caught real rate-limits, context-window
overflows, timeouts, server errors), but it re-fired on every additional error
past the threshold — one struggling session emitted ~8 near-identical alerts
(5/20, 6/20, 7/20 …), inflating a few incidents into dozens of rows. Track
sessions that have already alerted and fire at most once per session. Replaces
the oscillation-guard heuristic with a clean per-session latch.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_017qQKa76DCYkF68NfeNXtPw
Reconcile the session-status redesign (per-session tiles + archive,
SessionDetailView, scroll memory, RestoreButton, work map) with main's
framing rollout (Metabuilder-Labs#191), active-compute-time (Metabuilder-Labs#147), backfill plan-tier
propagation (Metabuilder-Labs#176), and the upsert never-clobber-known-tier CASE.

Conflict resolutions:
- status.py: union imports; keep tile/archive structure, compute
  active_seconds per tile, keep the framing block.
- db.py: combine main's plan_tier CASE with the branch's new
  service_namespace/instance/run/parent column updates.
- backfill.py: union reingest re-tagging with config->plan_tier.
- cmd_backfill/cmd_onboard: union new args / restart banner + tile guidance.
- index.html: keep both branches' helpers; route SessionDetailView cost
  cells through fmtFramedDollar and relabel Duration -> Elapsed so the
  merged framing regression tests hold; agent tile uses framed cost_today.
- tests: concatenate both UI regression sets; update two main tests to the
  branch's new contracts (active-session tile, ParsedSession cache-write arg).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_017qQKa76DCYkF68NfeNXtPw
… flags (Metabuilder-Labs#3)

The Claude Code JSONL backfill parser recomputed cost from token/model
metadata but never extracted per-message content or tool-input from the
transcript — even though that data is present and core/transcript.py already
reads it. Because backfill bypasses IngestPipeline/strip_captured_content, the
[capture] toggles were a no-op on the CC path: flipping capture.prompts changed
nothing. This blocked the context-cost diagnostic (Metabuilder-Labs#4), which needs per-message
content + inclusion args to attribute tokens.

parse_claude_code_session now accepts a CaptureConfig and, gated per toggle,
populates the assistant LLM span with gen_ai.prompt.content (the triggering
human prompt) and gen_ai.completion.content (the agent narration), and each
tool span with gen_ai.tool.input (the raw args). The keys match GenAIAttributes
so downstream consumers and alert content-stripping treat backfilled content
identically to live content. ingest_claude_code reads config.capture and
forwards it; iter_claude_code_sessions threads it through.

Extraction is strictly opt-in: the None default and the all-False CaptureConfig
default leave every span's attributes byte-for-byte unchanged ({"source": ...}),
so a default backfill is unchanged and stays 100% local. Reuses _block_text from
core/transcript.py for the record-walking.

Co-Authored-By: Claude <noreply@anthropic.com>
Metabuilder-Labs#4)

Claude Code's built-ins leave a real, documented gap: /compact is reactive,
lossy and single-session; /context shows current-session totals only. Neither
attributes WHAT is burning quota across sessions nor suggests a structural fix.
A whole DIY ecosystem (ccusage, codeburn, context-analyzer, session-recall, ...)
has sprung up to fill it — the strongest revealed-demand signal. Proof point:
anthropics/claude-code#24147, where a dev hand-parsed 30 days of JSONL to find
CLAUDE.md re-reads consumed 99.93% of their quota.

`tj context` runs a local diagnostic over Claude Code sessions and reports:

1. Per-turn context composition — what share of each turn went to RE-READING
   prior context (cache-read tokens: conversation history, CLAUDE.md, accrued
   tool output) vs. NET-NEW WORK (uncached input + output), with the re-read
   overhead named.
2. Recurring inclusions — the same file re-read across many sessions,
   frequency-counted, each with a concrete `@file` / CLAUDE.md structural fix
   (capture-gated on `[capture] tool_inputs`).
3. Compact candidates — sessions whose accumulated re-reading makes a
   mid-session /compact reclaim the most quota.

Framing is quota-native (the subscription majority): headline numbers render as
token-share / "% of cycle tokens" via core/framing.py — the single source of
truth for plan-tier-aware rendering — for Pro/Max users. Dollars are a SECONDARY
calibration signal for API users, never the headline. Output is a
screenshottable terminal card; `--json` for machine-readable output.

Honesty discipline (CLAUDE.md Rule 14): every figure is a measured token share
or a structural candidate flag, never a guaranteed saving; re-read tokens are
cache reads (billed at a reduced rate, not free) — real quota, stated as such.

Needs a direct DuckDB connection (reads the raw `attributes` column, which the
API shim doesn't expose) — fails gracefully with a `tj stop` hint when the
daemon holds the lock, mirroring `tj report --trim`.

Scoped OUT of v1 (named, not faked): MCP schema-injection attribution and
prompt-cache-miss attribution aren't derivable from current backfill data;
recurring-inclusion detection covers file Reads only. Per the Metabuilder-Labs#3 handoff,
content lands only on freshly-ingested spans after enabling `[capture]` — the
command surfaces that nudge rather than silently showing empty recurring data.

Covered by tests/unit/test_context_diagnostic.py over a synthetic multi-session
fixture: per-turn re-read-vs-work composition, cross-session recurring-inclusion
detection with structural fix, compact-candidate detection, the capture-off
nudge, and end-to-end CLI rendering of quota-share for a Max plan.

Co-Authored-By: Claude <noreply@anthropic.com>
…abuilder-Labs#6)

Distribution research (adoption.md) is blunt: 15-second-TTFV `npx` tools
dramatically out-adopt 2-minute-install tools, and every high-star CLI in
this niche has sub-30s time-to-first-value. TokenJam's `pipx install` +
launchd/systemd daemon + onboarding is great for power users but measurably
higher friction than `npx ccusage` for the Claude Code crowd — who reach for
`npx` first and whose JSONL tj reads is the very same file ccusage parses.

This ships a no-setup first-run path so a user with zero prior setup runs ONE
command and sees where their Claude Code quota actually goes in well under 30s.

- `tj quickstart` (cmd_quickstart.py): opens a transient InMemoryBackend
  (nothing written to ~/.tj, no config read/written, no daemon started or
  contacted), backfills ~/.claude/projects/*.jsonl into it, then renders quota
  composition (reusing Metabuilder-Labs#4's context_diagnostic engine) + a session timeline
  (new core/session_timeline.py). Output leads with ccusage-parity framing.
  Registered in `no_db_commands` so the CLI never opens the on-disk DB / trips
  the daemon lock for it.
- Bare `tj` (no subcommand) routes to quickstart — the group is now
  `invoke_without_command=True`, so `uvx --from tokenjam tj` is one command.
  `--help`/`--version` are eager and still short-circuit.
- `npm-wrapper/`: a dependency-free npm package named `tj` so `npx tj` works —
  a thin launcher that shells out to the Python CLI via the first available
  runner (uvx -> pipx run -> installed tj) and passes args through. NOT
  published here (build + document only).
- Docs: README + docs/installation.md lead with `npx tj` / `uvx --from
  tokenjam tj` as the primary starting point; `tj onboard` stays the opt-in
  "go deeper" (daemon/MCP/live) path.

Honesty discipline (Rule 14) preserved: every figure is a measured token share
re-derived from the JSONL, never a projected saving.

Tests: tests/unit/test_quickstart.py covers the timeline core, the CLI render,
the JSON output, the no-logs branch, and that quickstart + bare `tj` never open
the on-disk DB (open_db patched to raise). Validated with `make all` — ruff +
mypy clean, full suite 1171 passed. `uvx --from <repo> tj` and the npm wrapper
passthrough were exercised locally end-to-end.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Every other dollar-bearing read route (/cost, /optimize, /budget, /traces,
/status) returns a `framing` block (core/framing.py, Metabuilder-Labs#191), but the session
detail endpoint did not. During the origin/main merge SessionDetailView's
Overview / subagents / traces cost cells were routed through
fmtFramedDollar(..., framing) to satisfy main's global framing regression,
yet framing resolved to null there — so fmtFramedDollar silently fell back to
raw fmtCost and subscription users (e.g. Claude Max) saw verbatim dollars on
the session detail page that are suppressed everywhere else.

Compute a framing block in get_session_detail via compute_framing (the single
source of truth) with a window-independent plan mix scoped to the session's
agent (plan_determination_mix, mirroring /status and /traces), passing the
session's own token total + cost as window totals so the subscription
token-share denominator is present. No UI change needed — SessionDetailView
already reads data.framing.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Add `tj quota-audit` — the accountability companion to opusplan / `/model`.
Those are forward-looking and say nothing about session history; nothing
answered the backward-looking question "which of my PAST Opus sessions were
Sonnet-shaped?". This runs the structural downsize heuristic retroactively,
scoped to Opus sessions, and reports:

- the headline "% of your Opus quota reclaimable from Sonnet-shaped sessions"
  computed as candidate-Opus-tokens / total-Opus-tokens — quota language, not
  dollars (the subscription majority is on a flat fee, so dollar framing
  mis-targets them; implied dollars are a secondary API-only calibration line);
- the specific example sessions to spot-check;
- an optional tuned routing-config export (`--export-config claude-code`,
  reusing the existing snippet generator via a DowngradeFinding shim).

Framed as an audit (quota terms) with the honesty caveat always visible —
"candidates to spot-check, never safe-to-downgrade". Computes purely from
already-backfilled token/model metadata; does not depend on captured content.

`audit_opus_quota()` + `OpusQuotaAudit` / `OpusAuditExample` extend
model_downgrade.py / types.py; framing reuses core/framing (Metabuilder-Labs#4 pattern,
window-independent plan mix per Metabuilder-Labs#177). Covered by a synthetic-mix test
(Sonnet-shaped Opus candidates vs Opus-shaped Opus vs excluded Sonnet sessions)
asserting the % reclaimable, the spot-check listing, and the caveat render.

Co-Authored-By: Claude <noreply@anthropic.com>
…y card

The "tokenmaxxing → tokenminimizing" cultural shift (June 2026) means a
spend-brag card now repels while an efficiency/quota card aligns, and the
subscription majority has no dollar spend to brag about at all. The old card
rendered an (ironic) spend-tier ladder — the wrong polarity.

Reframe the card to a quota/efficiency artifact:
- Lead with the context-COMPOSITION headline from Metabuilder-Labs#4's
  compute_context_diagnostic: what share of quota went to overhead (re-reading
  history / CLAUDE.md / tool output) vs real work (uncached input + output).
- Classify into 5 efficiency tiers keyed on the overhead share (lower = leaner
  = a better tier), replacing the 6-tier spend ladder.
- Render quota-native via core/framing, mirroring Metabuilder-Labs#5's polarity: headline is a
  token-share / "% of cycle tokens"; dollars are demoted to a secondary
  "Implied API value" line shown only for api plans, suppressed for
  subscription / local / unknown. No dollar spend-brag for subscription users.
- Frame the action line as reclaimable quota (points at tj context).
- Add a --weekly "Quota Wrapped" recap preset (7-day window + recap copy).

The card now needs a direct DuckDB connection (it reads context composition),
matching tj context / tj quota-audit. Tests rewritten to assert the efficiency
framing (composition + reclaimed, not spend-brag) and quota-native suppression
of dollars for a subscription plan, plus weekly mode and the api secondary line.
Updated the two forward-looking release checklists and CLAUDE.md to match.

Co-Authored-By: Claude <noreply@anthropic.com>
…bs#8)

Real Claude Code backfills carry the current Anthropic model ids, but
`claude-fable-5` and `claude-sonnet-4-5` had no entry in the packaged
pricing table. `get_rates` returned None for them, so `calculate_cost`
fell back to the $0.50/$2.00-per-MTok default and logged a "No pricing
data" warning — silently mis-pricing the secondary API-plan "implied
value / calibration" dollar line in `tj context`, `tj quota-audit`, and
`tj tokenmaxx`. Surfaced while building Metabuilder-Labs#6.

Add both entries with the published per-MTok rates
(https://platform.claude.com/docs/en/pricing):
  - claude-fable-5: input 10.00 / output 50.00 / cache-read 1.00 /
    cache-write(5m) 12.50
  - claude-sonnet-4-5: input 3.00 / output 15.00 / cache-read 0.30 /
    cache-write(5m) 3.75 (same published tier as Sonnet 4.6)

The other current ids the backfill warns about — claude-opus-4-8,
claude-sonnet-4-6, claude-haiku-4-5 — were already present.

Extend tests/unit/test_pricing_override.py to assert each current
Anthropic model resolves to its packaged rate, that none falls through
to the default fallback, and that the dated YYYYMMDD variant resolves
via the existing suffix-strip path.

Co-Authored-By: Claude <noreply@anthropic.com>
…ns on --reingest

Enabling [capture] after a Claude Code session is already ingested never
landed message content / tool_input on the existing rows: the idempotent
INSERT skips on span_id conflict, and the --reingest path updated only
sub_agent_id, leaving the attributes column untouched. So Metabuilder-Labs#4's
recurring-inclusion detection (which reads that content) only worked
against a fresh DB.

The --reingest UPDATE now also overlays the freshly-parsed span's
attributes over the stored ones (parsed wins per key) via a small
_merge_attributes helper. This ADDS the content keys
(gen_ai.prompt.content / completion.content / tool.input) without
discarding any keys the stored row already carried (e.g. from live
ingest). Capture-off reingest stays a no-op — the parsed attributes are
just {"source": ...}, which the stored row already has — so it never
wipes content a prior capture-on backfill stored.

Two tests added: (1) ingest with capture OFF, enable capture, re-run with
--reingest -> existing spans now carry content/tool_input, no fresh DB,
no new rows; (2) a capture-off --reingest does not delete previously
stored content.

Co-Authored-By: Claude <noreply@anthropic.com>
…laude history

The zero-install first run (`tj quickstart` / bare `tj`) backfills
~/.claude/projects/*.jsonl into a transient in-memory DB before printing
the headline. Profiling a 3000-session synthetic history showed the
backfill dominates (99.7s) while the diagnostic (0.45s) and timeline
(0.04s) are trivial — the cost is the per-span SELECT+INSERT round-trips
in `_insert_session_idempotent` (120k spans => 240k DuckDB statements).
A first run over a large history blew well past the <30s time-to-first-
value goal Metabuilder-Labs#6 targets.

Add a most-recent-N-sessions cap to the first-run path: `iter_claude_code_
sessions` gains an optional `max_sessions` that walks files mtime-desc and
stops once the cap is reached, bounding both parse and insert work
regardless of how large ~/.claude is. `ingest_claude_code` threads it
through and sets `BackfillResult.limit_reached`. `cmd_quickstart` caps at
the most-recent 300 sessions by default (≈9s even on an 8000-session
history), exposes `--full` to lift the cap, and discloses the truncation
honestly ("Showing your most-recent N sessions … run tj quickstart --full
or tj context for your full history") so it never reads as "this is
everything". The full `tj backfill claude-code` path passes no cap and is
byte-for-byte unchanged.

Tests assert BOUNDED WORK (only N sessions/rows ingested on a large
synthetic history, the cap keeps the freshest sessions, the limit flag and
CLI disclosure render) rather than a flaky wall-clock number.

Co-Authored-By: Claude <noreply@anthropic.com>
…ction beyond file reads

Metabuilder-Labs#4's recurring-inclusion detection only covered file Reads (group by
`file_path`, frequency-count across sessions, suggest `@file`/CLAUDE.md).
Repeated prompts, repeated searches, and large repeated tool outputs also
re-paste across turns/sessions and burn the same quota, but went undetected.

Generalize the detector into one `(inclusion_type, tool_name, signature)`
aggregation carrying a distinct-session set + occurrence count, then flag four
kinds, each with a type-appropriate structural fix:
  * file reads (Read/View/Cat `file_path`) -> `@file` / CLAUDE.md;
  * searches (Grep/Glob/Search query/pattern) -> pin / capture the result;
  * prompts (identical `gen_ai.prompt.content` re-sent) -> save as a
    slash-command / CLAUDE.md note;
  * large tool outputs (identical `gen_ai.tool.output` >= 2K chars re-pasted)
    -> reference the artifact instead of re-running.

Each kind is independently capture-gated by its `[capture]` toggle
(tool_inputs / prompts / tool_outputs), so default-off behavior is unchanged
and a flag-off kind contributes nothing. File reads / searches gate on distinct
sessions (a single session re-reading a file isn't structural); prompts /
outputs re-paste across turns within a session too, so they gate on raw
occurrence count. Framing stays quota-native (Metabuilder-Labs#4). Tool outputs are only
captured on the live ingest path, not the on-disk transcript — the nudge note
says so. `RecurringInclusion` now carries `inclusion_type`; the CLI renders a
per-kind tag and the JSON payload round-trips it.

Extends the synthetic fixture to exercise recurring Grep, prompts, and large
outputs, asserting each is detected with frequency + the structural fix, plus a
size-gate test and a per-flag gating test.

Co-Authored-By: Claude <noreply@anthropic.com>
…med overhead source

`tj context` (Metabuilder-Labs#4) named re-read overhead (cache reads) but not the two
specific large sources Metabuilder-Labs#11 asked for: MCP schema-injection and prompt-
cache-miss tokens. This ships the derivable half and parks the other with
a precise note, keeping the honesty discipline (Rule 14) — no invented
attribution numbers.

Cache-miss (DERIVABLE): cache-creation tokens (NormalizedSpan.cache_write_
tokens, from the on-disk usage block's cache_creation_input_tokens) are
input that missed the cache and had to be written to it, billed by
Anthropic at a premium. They were already summed into total_cache_write_
tokens and the composition denominator but never named as their own
overhead category — so the re-read and net-new-work shares silently failed
to sum to 100% whenever cache writes were present. Now surfaced as a
distinct `cache_miss` category: total_cache_miss_tokens / cache_miss_share
on the diagnostic, cache_miss_tokens / cache_miss_share per turn, a new
"Cache-miss:" breakdown line in the card (shown only when non-zero so
default output is unchanged), and corresponding JSON fields.

MCP schema-injection (~25K tok/call) (PARKED): not derivable from current
data. Neither the on-disk transcript nor live spans carry per-tool /
per-schema token attribution — tool-definition tokens are folded into
input/cache-creation, and `mcp__`-prefixed names appear only on tool-
invocation spans (no schema-injection token count). Surfaced as
MCP_INJECTION_PARK_NOTE (a diagnostic note + a JSON field) naming exactly
what data would be needed (a per-request tool-schema token delta) rather
than fabricating a figure.

Tests: extend the synthetic fixture so session B pays a cache-creation
premium; assert the named cache-miss category is computed (core + per-turn
+ JSON), rendered in the card, and that the MCP half is parked with a
precise note. Default-off behavior unchanged. Full suite green
(1207 passed), ruff + mypy clean.

Co-Authored-By: Claude <noreply@anthropic.com>
…abuilder-Labs#14)

The claude-haiku-4-5 entry in pricing/models.toml carried the retired
Haiku 3.5 rates (input 0.80 / output 4.00 / cache_read 0.08 /
cache_write 1.00 per MTok) — identical to the claude-haiku-3-5 row. The
entry existed, so it did not trip the Metabuilder-Labs#8 "No pricing data" warning, but it
silently understated the Haiku-4.5 API-plan dollar calibration line.

Update to the current published Haiku 4.5 rates (input 1.00 / output 5.00 /
cache_read 0.10 / cache_write 1.25 per MTok, 5-minute cache-write column),
verified against platform.claude.com/docs/en/about-claude/pricing. Haiku
3.5 keeps its own old rates for the Bedrock/Vertex date-suffix fallback.

Tests: add claude-haiku-4-5 to the Metabuilder-Labs#8 current-Anthropic coverage dict and a
dedicated regression asserting it no longer carries the retired 3.5 figures
(and that 3.5 still does). Update the dollar/rate assertions across
test_cost.py, test_cost_tracking.py, test_provider_attribution_194.py, and
test_cost_charts.py that were pinned to the old Haiku rate.

Co-Authored-By: Claude <noreply@anthropic.com>
…etabuilder-Labs#15)

The Claude Code backfill ran one existence-check SELECT plus one INSERT per
span in `_insert_session_idempotent` — ~2 DuckDB round-trips per span, so a
large history (~120k spans) paid ~240k statements (~100s). Metabuilder-Labs#13 bounded only
the quickstart first-run path; the full `tj backfill claude-code` / daemon
path still paid the full per-span cost.

Replace the per-span loop with a bulk path that processes a whole session in
a BOUNDED number of statements regardless of span count:

- Partition new-vs-existing span_ids in ONE chunked `WHERE span_id IN (...)`
  query (`_existing_span_ids`) instead of N existence SELECTs.
- Bulk-insert the new spans in a single `executemany` (`_SPAN_INSERT_SQL` +
  `_span_insert_params`, kept in lock-step with `DuckDBBackend.insert_span`).
- On `--reingest`, batch-load the existing rows' attributes in ONE chunked
  query (`_load_attrs_bulk`), compute the per-key attribute merge in Python,
  then apply the updates in a single `executemany`.

The Metabuilder-Labs#10 idempotency + reingest contract is preserved exactly: new spans
insert; existing spans without `--reingest` are skipped untouched; existing
spans with `--reingest` get `sub_agent_id` updated and `attributes` per-key
merged (overlay `{**stored, **parsed}`, parsed wins, never wipes a stored
key). The no-`conn` fallback path is unchanged. Drops the now-unused
single-span `_load_attrs` helper.

Tests: all existing Metabuilder-Labs#3/Metabuilder-Labs#10/Metabuilder-Labs#13 backfill tests pass unchanged. Adds four Metabuilder-Labs#15
tests — bulk insert lands every new span; existing spans are skipped without
`--reingest`; `--reingest` still merges attributes (overlay, no wipe) +
updates sub_agent_id; and a counting-connection test asserting a 400-span
session inserts in a BOUNDED number of statements (well under N execute()
calls, via executemany) rather than ~2 per span.

Co-Authored-By: Claude <noreply@anthropic.com>
…lar (Metabuilder-Labs#17)

Metabuilder-Labs#2 (commit f845571) added a `framing` block to /api/v1/sessions/{id} and
claimed "no UI change needed," but two dollar-bearing cells in the served
SPA still called bare `fmtCost(...)`. For a Max-subscription user
(framing.display_rule = suppress_dollars_for_subscription_share) this
leaked raw dollars: the session-detail "Cost & Tokens" / "Implied API
value" panel rendered e.g. "$198.9709" (and "$0.0000" on empty sessions),
and the Status "Archived sessions" table cost column showed raw "$0.0000".
The panel only relabeled the header for subscription users while still
emitting the raw figure — the exact honesty leak Metabuilder-Labs#191/Metabuilder-Labs#2 exist to prevent.

Both cells now route through the existing `fmtFramedDollar(value, framing)`
helper, matching how Traces/Cost/Optimize/Analytics render: subscription
users see token-share ("% of cycle"), local sees "—", and only api/unknown
plans see raw "$". The `framing` block is already in scope in both views —
SessionDetailView reads `data.framing` off /sessions/{id}; the Status view
reads `data.framing` off /status (already consumed by the agent cards'
"Cost today" cell). No backend change needed.

Guarded by two new static-grep regressions in
tests/unit/test_lens_ui_regression.py asserting the bare-fmtCost patterns
are gone and the framed forms are present (there is no JS runner in the
Python CI job; the served source is the unit under test).

Co-Authored-By: Claude <noreply@anthropic.com>
…ion aggregate

The Status archived-sessions list and the /api/v1/sessions/{id} detail view
read tokens/cost straight off the denormalized `sessions` aggregate columns.
Those columns are accumulated per span keyed by `span.session_id` in
`IngestPipeline._build_or_update_session`. That is correct when telemetry
stamps the session id on its cost spans (the /v1/logs Claude Code/Codex path
and the on-disk backfill both do). But a fan-out harness posting raw OTLP to
/api/v1/spans emits the zero-cost `invoke_agent` marker span WITH `session.id`
while its cost-bearing `gen_ai.llm.call` spans carry only `agent_id` + a shared
`traceId` (otlp_parsing reads the optional `attrs.get("session.id")`). Those
cost spans arrive with session_id=None, so `_resolve_session` mints a throwaway
UUID for each and they never accumulate onto the marker's session row. The row
stays 0 even though the spend is real and surfaces correctly on Cost / Analytics
/ Traces, which key by agent_id / trace and never by session_id.

The defensible association is the trace: the marker span (which carries the
session_id) and the cost spans share a `trace_id` — exactly the join /traces
already uses to attribute the same spans. New `session_token_cost_rollup()` in
core/db.py rolls up every span whose `session_id` matches OR whose `trace_id`
appears on a span carrying this session_id. It only follows trace linkage that
originates from a span already bearing the session_id, so it never pulls in
unrelated traces and never resorts to a lossy agent_id+time-window heuristic
that could cross-attribute concurrent sessions. When the cost spans DO carry the
session_id (the common case) the trace clause is redundant and the result equals
the plain session_id sum — a strict superset that never under- or over-counts
the already-correct paths. Each span is counted once (a disjunction over rows,
not a self-join), so no double-counting. Returns None when the session has no
spans, and both callers fall back to the stored row in that case.

Status `_build_archive` and the session-detail `session` block + framing now
report the rolled-up figures, reconciling per-session rows with /api/v1/cost.

Covered by three integration tests over a fixture that mirrors the harness shape
(session + marker carry session_id+trace; cost spans carry only trace+agent_id):
they assert the session-detail row, the Status archived row, and reconciliation
with /api/v1/cost's agent total.

Co-Authored-By: Claude <noreply@anthropic.com>
…not live agents

The Overview/landing tab rendered the "No data yet — TokenJam is listening"
onboarding empty-state whenever the LIVE-agents list (/status `agents`) was
empty — even when the DB was full of historical/backfilled spans (344M tokens,
$23K) that Cost/Analytics/Traces all showed. Since `/status` `agents` carries
only currently-active session tiles, an all-historical DB (nothing streaming)
read as "all my data is gone" on the front door — a false data-loss scare
during dogfooding.

Reserve the onboarding empty-state for a genuinely empty DB. The Overview now
decides "empty" from whether ANY data exists rather than from the live-agents
list: window totals from /cost (`total_tokens` / `total_cost_usd` / `rows` —
the same load-bearing signal the other screens use) OR any historical/live
session in /status (`agents` or `archived`). /status moves into the existing
parallel fan-out with a non-fatal `.catch` (a failing /status must not blank
the Overview), preserving the Metabuilder-Labs#124 asymmetric error-handling contract (/cost
stays load-bearing with no catch).

Frontend-only fix. Guarded by a static-grep regression test asserting the
buggy live-agents-only gate is gone and the has-data/has-session gate is
present, plus an API test proving a historical-only DB (closed session + cost
spans, no live agents) reports `agents == []` but a non-empty `archived` and
non-zero /cost totals — the exact signals the new gate keys off.

Co-Authored-By: Claude <noreply@anthropic.com>
…der-Labs#20)

The Traces tab initialized `traces` to an empty array and rendered the
"No traces yet" empty-state immediately on first paint — before the
/api/v1/traces fetch resolved. On a slow fetch this flashed for ~0.5–1s
even when the API returned real rows, misreading as "no data."

Add a `loaded` flag (mirroring the Cost view's `loading` pattern) that
flips true only once the first fetch settles. While `!loaded`, render a
loading shimmer; the "No traces yet" empty-state is now gated on `loaded`
so it can only appear after a fetch genuinely returned zero rows.

The Cost view already uses a `loading` flag + shimmer for its chart body,
so it does not flash a fake empty table; the residual `$0.0000` in the
chart-header total is a separate, narrower change and is left out of scope
per the ticket.

Guarded by a static-grep regression test asserting the Traces view
distinguishes the loading state (shimmer) from the loaded-empty state.

Co-Authored-By: Claude <noreply@anthropic.com>
Bring mainline (v0.5.2, +107 commits: proxy/policy, Analytics, pricing
overrides, Dashboard UI, request capture Metabuilder-Labs#209, resume-dedup Metabuilder-Labs#294) into the
session-status branch. Notable reconciliations:

- db.py migrations: kept main's RELEASED migrations 6 (policy tables) and 7
  (request capture) at their version numbers; renumbered the branch's session
  migrations to 8-13. Safe because run_migrations tracks applied versions as a
  set, and 6/7 already shipped to production DBs. insert_span now carries all
  28 columns; kept the _trace_filter_where refactor + the close_* methods.
- backfill.py: combined main's dedup-by-message.id (Metabuilder-Labs#294) with the branch's
  sub_agent_id + content-capture + cache-write tracking; kept the bulk-insert
  path. Totals computed from deduped spans, including cache_write.
- UI (index.html): adopted main's Dashboard direction (Overview retired),
  grafting the branch's StatusView project-grouping/archived/clickable cards,
  the Traces loading shimmer (Metabuilder-Labs#20), and PORTING the Metabuilder-Labs#19 backfilled-DB
  empty-state fix into DashboardView.
- Bare `tj`: adopted main's branded home screen (Metabuilder-Labs#240); `tj quickstart` stays
  the explicit zero-install first run. Onboard project name now defaults to the
  derived/--project value (no interactive prompt) to keep main's plan-first flow.

All 1397 tests pass; ruff + mypy clean.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@anshss anshss requested a review from anilmurty as a code owner June 25, 2026 13:08
@anilmurty

Copy link
Copy Markdown
Contributor

look who's back! :) -- big change, will review thoroughly

@anilmurty

Copy link
Copy Markdown
Contributor

@anshss this is seriously impressive, and it honestly made my week to see you back on this project.

I went through it properly, so a few specifics first. The lifecycle root cause is a great catch: the zero-duration invoke_agent span from Claude Code's logs export getting read as a completion, so every live session force-completed on its first prompt. That one's been quietly wrong on the Status page for a while and you nailed the diagnosis. The merge reconciliation was also really clean, keeping main's released migrations and renumbering yours to 8 through 13, folding in the recent message.id dedup, porting the empty-state fix. That made 15k lines a lot easier to follow.

Here's the honest lay of the land, and it's mostly about timing. You dropped off right as we were narrowing hard into cost-optimization, and the North Star moved while you were out: cost first, with observability as the substrate underneath rather than the headline. A lot of this PR is the session-timelines / agent-debugging direction you'd been building (the Timeline play-by-play, the Map, the subagent trees), which was exactly the right instinct for where we were back in May, and is genuinely beautiful work.

So rather than one big merge, I think the cleanest way to look at it is two lanes.

The cost lane, squarely where TJ is now, and I'd love all of it:

  • the lifecycle fix (it's also what makes per-session cost accurate)
  • sub-agent cost attribution plus the right-sizing analyzer
  • the cost/pricing correctness (cache-creation split, current rates, sessions framing, bulk-insert perf)
  • tj context, the tokenmaxx quota card, the Opus quota audit

I'd love to land the lifecycle fix first and on its own since it's a real bug and deserves to ship fast, then the rest of the cost pieces as focused follow-ups.

The observability lane (aka old OCW focus), which is more of a conversation than a merge:

  • the session-detail Timeline, the Map, run grouping, the session-status tiers
  • not because it isn't great (it is), but because pure "what did my agent do" debugging is the crowded category we deliberately stepped back from (and chose to work with existing solutions rather than build our own). The interesting part is your Map already fuses cost into the activity view ($110, "1 flagged for right-sizing"), and that version, where it's really answering "where did the spend go and what's over-provisioned," is very much on-mission. So it's less "drop this" and more "let's figure out together which pieces we point at waste and spend versus pure activity."

Can we jam on that part? I'll DM you too. Short version: the cost pieces I want now, lifecycle first, and the timeline/map work I want to think through with you so we aim it at where TJ is headed rather than where it was.

Either way, really glad to have you back on this.

@anilmurty anilmurty left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

documented in previous comment. After our discussion we can keep all the feature changes but ideally should split into separate PRs to limit risk

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants