TokenJam reads your agent's telemetry and tells you when to downsize, when to trim prompts, what to cache, what to script, and what plans you've already paid to figure out — then shows it all in a local browser dashboard. Runs entirely on your machine.
pipx install tokenjam
Don't have pipx? brew install pipx on macOS, apt install pipx on Debian/Ubuntu, or see docs/installation.md. pip install tokenjam also works in a clean venv.
No cloud · No signup · No vendor lock-in
⭐ If TokenJam saves you tokens, star it · 👁 Watch for releases — we ship often
TokenJam reads telemetry from every major agent runtime, framework, provider, and observability tool and surfaces savings across five areas — then brings them together in a local browser dashboard.
|
Flags sessions where a cheaper model in the same family is worth a look. Never claims quality equivalence — surfaces examples so you can spot-check. |
Shows your current caching ratio per (provider, model) and suggests Anthropic prompt-cache breakpoints from stable prefixes in your real usage. |
|
Finds clusters of deterministic |
Predicts which regions of your prompts the model gives little weight to. Surfaces what's safe to cut. |
|
Detects clusters of sessions where your agent re-plans the same work and exports reviewable skeleton templates you can drop into a slash command or script. |
A local browser dashboard that brings every analyzer's findings, your real spend, and your alerts together in one place. No cloud, no signup, fully offline. |
Run all five analyzers with tj optimize. Run several with tj optimize downsize cache reuse.
For Claude Code users — zero code, auto-backfills your last 30 days:
pipx install tokenjam
tj onboard --claude-code
tj optimize # cost-saving candidates from your actual usage
tj serve # open the dashboard at http://127.0.0.1:7391/That's it. Run tj any time and it points you to the next best action:
_____ _ _
|_ _|__| |_____ _ _ | |__ _ _ __
| |/ _ \ / / -_) ' \ | / _` | ' \
|_|\___/_\_\___|_||_|_/ \__,_|_|_|_|
|__/
TokenJam · cost-optimization for AI agents · local-first, OTel-native · no signup
You're set up. Next best actions:
tj status agent overview — what's running, recent cost
tj tokenmaxx your shareable spend tier
tj optimize cost-saving candidates from your usage
tj serve open Lens (web UI) at http://127.0.0.1:7391/
To upgrade later: pipx upgrade tokenjam (then tj stop && tj serve & to reload the daemon, and tj --version to verify). See docs/installation.md.
For any Python agent:
from tokenjam.sdk import watch
from tokenjam.sdk.integrations.anthropic import patch_anthropic
patch_anthropic()
@watch(agent_id="my-agent")
def run(task: str) -> str:
...→ Python SDK · TypeScript SDK · Codex · OTel-compatible agents
tj serve runs Lens at http://127.0.0.1:7391/: a Dashboard that lands you on recoverable waste and health at a glance, with an embedded explorer to slice your usage any way (metric × dimension × chart); plus Status, Traces, Cost, Analytics, Alerts, Drift, Optimize, and Budget screens. Plan-tier-aware, fully offline, no signup.
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
→ tokenjam.dev/products/lens for the visual walkthrough.
TokenJam is also a full observability stack. The five analyzers and Lens ride on top.
- Real-time cost tracking — every LLM call priced as it happens
- Safety alerts — 13 alert types, 6 channels (ntfy, Discord, Telegram, webhook, file, stdout)
- Behavioral drift detection — Z-score baselines, no LLM required
- Schema validation — declare or infer JSON Schema for tool outputs
- OTel-native — point any OTLP exporter at
tj serveand you're done - MCP server — lets Claude Code query its own telemetry mid-session
tj optimize # all five cost-optimization analyzers
tj optimize downsize # one analyzer (positional args)
tj tokenmaxx # shareable spend-tier callout
tj status # current cost, tokens, active alerts
tj cost --since 7d # spend by agent / model / day / tool
tj alerts # everything that fired while you were away
tj drift # behavioral drift Z-scores
tj report --reuse # HTML + Markdown skeleton export for the Reuse analyzer
tj backfill claude-code # ingest historical ~/.claude/projects/ sessions
tj serve # start Lens + REST API| Topic | Where |
|---|---|
| 🪶 Downsize / Cache / Script / Trim deep-dives | docs/optimize/ |
| 🔁 Reuse analyzer deep-dive | docs/optimize/reuse.md |
| Claude Code & Codex integration | docs/claude-code-integration.md |
| Python SDK reference | docs/python-sdk.md |
| TypeScript SDK reference | docs/typescript-sdk.md |
| Framework support (LangChain / CrewAI / etc.) | docs/framework-support.md |
| Alert channels & rule reference | docs/alerts.md |
| Backfill from Langfuse / Helicone / OTLP | docs/backfill/ |
| Configuration | docs/configuration.md |
| Architecture deep-dive | docs/architecture.md |
| Installation extras (Trim, framework patches) | docs/installation.md |
| Export to Grafana / Datadog / NDJSON | docs/export.md |
| NemoClaw sandbox observer | docs/nemoclaw-integration.md |
| Release notes | GitHub Releases |
Shipped in 0.3.x: Downsize · Cache · Script · Trim · Claude Code + Codex onboarding · MCP server · Web UI · Backfill adapters (Langfuse, Helicone, OTLP) · Period comparison · Routing-config export · Read-only policy preview
Shipped in 0.4.x:
- TokenJam Lens — local dashboard rebrand: Overview triage front-door, Optimize detail tab, real spend-over-time charts, cross-screen drill-through
- Reuse analyzer — fifth analyzer: detects clusters of sessions with repeated planning, exports reviewable skeleton templates you can convert into slash commands or scripts
- Daemon DB concurrency — per-thread DuckDB cursors so the Overview's fan-out doesn't block on a single shared connection (v0.4.1)
- Cache cost transparency —
cache_read+cache_writetoken columns surfaced in CLI + UI + API (the previously-hidden ~91% cost driver on cache-heavy workloads)
Shipped in 0.5.x:
- Lens Visualizations — an Analytics pivot explorer (metric × dimension × chart, presets, CSV), stacked cost-by-model, a cache-savings chart, KPI sparklines, a cost-annotated trace waterfall, and consistent series coloring
- Merged Dashboard — the explorer and the triage front-door unified into one default screen, with in-place drill-through from recoverable-waste tiles
- First-run polish — backfill fidelity (session-level traces, cache read/write split, honest session counts), plan-tier-aware framing throughout (subscription users see token-share, never raw spend), an onboarding welcome banner + next-steps guidance, and a contribution funnel
Up next (roughly):
- Continued Lens polish + per-product visual branding
-
tj policy add | edit | apply— unified rule surface -
tj replay— replay captured sessions against new model versions - TypeScript framework patches (LangChain JS, OpenAI Agents SDK)
- Vercel AI SDK & Mastra integrations
- Docker image
- GitHub Actions for CI drift/cost checks
TokenJam is MIT, and contributions are welcome — from a one-line pricing fix to a whole new framework integration. A few easy on-ramps:
- 🟢 Good first issues → — scoped, newcomer-friendly tasks, ready to pick up.
- 💸 Model pricing —
tokenjam/pricing/models.tomlis community-maintained. Fix a rate or add a model in a single PR — no issue needed. - 🔌 Framework integrations — provider/framework patches follow one clear pattern (
tokenjam/sdk/integrations/anthropic.pyis the reference). Open an issue first to align on approach. - 🤖 Built with coding agents — TokenJam is built by AI coding agents, and contributing with one is first-class. Claude Code: read CLAUDE.md and run
/initto bring your agent up to speed. Codex / other agents: AGENTS.md has the critical rules.
Setup and the full dev workflow are in CONTRIBUTING.md.
If TokenJam saves you tokens, ⭐ star it and 👁 watch for releases — we ship often.
tokenjam.dev · PyPI · npm · Issues
MIT License · Built by Metabuilder Labs
TokenJam was created by Anil Murty — reach him at anil@metabldr.com.





