GitHub - Metabuilder-Labs/tokenjam: Token Efficiency For AI Agents

Token Efficiency For AI Agents

TokenJam reads your agent's telemetry and tells you when to downsize, when to trim prompts, what to cache, what to script, and what plans you've already paid to figure out — then shows it all in a local browser dashboard. Runs entirely on your machine.

pipx install tokenjam

_{Don't have pipx? brew install pipx on macOS, apt install pipx on Debian/Ubuntu, or see docs/installation.md. pip install tokenjam also works in a clean venv.}

No cloud · No signup · No vendor lock-in

_{⭐ If TokenJam saves you tokens, star it · 👁 Watch for releases — we ship often}

Five Analyzers + Lens. One Install.

TokenJam reads telemetry from every major agent runtime, framework, provider, and observability tool and surfaces savings across five areas — then brings them together in a local browser dashboard.

🪶 Downsize

Flags sessions where a cheaper model in the same family is worth a look. Never claims quality equivalence — surfaces examples so you can spot-check.

tj optimize downsize

Details →

💾 Cache

Shows your current caching ratio per (provider, model) and suggests Anthropic prompt-cache breakpoints from stable prefixes in your real usage.

tj optimize cache

Details →

📜 Script

Finds clusters of deterministic (tool_name, arg_shape) sequences that match the shape of work a plain script could replace.

tj optimize script

Details →

✂️ Trim

Predicts which regions of your prompts the model gives little weight to. Surfaces what's safe to cut.

tj optimize trim

Details →

🔁 Reuse

Detects clusters of sessions where your agent re-plans the same work and exports reviewable skeleton templates you can drop into a slash command or script.

tj optimize reuse

Details →

🔭 Lens

A local browser dashboard that brings every analyzer's findings, your real spend, and your alerts together in one place. No cloud, no signup, fully offline.

tj serve

Details →

Run all five analyzers with tj optimize. Run several with tj optimize downsize cache reuse.

30-second quickstart

For Claude Code users — zero code, auto-backfills your last 30 days:

pipx install tokenjam
tj onboard --claude-code
tj optimize          # cost-saving candidates from your actual usage
tj serve             # open the dashboard at http://127.0.0.1:7391/

That's it. Run tj any time and it points you to the next best action:

 _____    _              _
|_   _|__| |_____ _ _   | |__ _ _ __
  | |/ _ \ / / -_) ' \  | / _` | '  \
  |_|\___/_\_\___|_||_|_/ \__,_|_|_|_|
                     |__/
  TokenJam · cost-optimization for AI agents · local-first, OTel-native · no signup

You're set up. Next best actions:
  tj status      agent overview — what's running, recent cost
  tj tokenmaxx   your shareable spend tier
  tj optimize    cost-saving candidates from your usage
  tj serve       open Lens (web UI) at http://127.0.0.1:7391/

To upgrade later: pipx upgrade tokenjam (then tj stop && tj serve & to reload the daemon, and tj --version to verify). See docs/installation.md.

For any Python agent:

from tokenjam.sdk import watch
from tokenjam.sdk.integrations.anthropic import patch_anthropic

patch_anthropic()

@watch(agent_id="my-agent")
def run(task: str) -> str:
    ...

→ Python SDK · TypeScript SDK · Codex · OTel-compatible agents

Lens — the local dashboard

tj serve runs Lens at http://127.0.0.1:7391/: a Dashboard that lands you on recoverable waste and health at a glance, with an embedded explorer to slice your usage any way (metric × dimension × chart); plus Status, Traces, Cost, Analytics, Alerts, Drift, Optimize, and Budget screens. Plan-tier-aware, fully offline, no signup.

→ tokenjam.dev/products/lens for the visual walkthrough.

Beyond optimization

TokenJam is also a full observability stack. The five analyzers and Lens ride on top.

Real-time cost tracking — every LLM call priced as it happens
Safety alerts — 13 alert types, 6 channels (ntfy, Discord, Telegram, webhook, file, stdout)
Behavioral drift detection — Z-score baselines, no LLM required
Schema validation — declare or infer JSON Schema for tool outputs
OTel-native — point any OTLP exporter at tj serve and you're done
MCP server — lets Claude Code query its own telemetry mid-session

CLI

tj optimize            # all five cost-optimization analyzers
tj optimize downsize   # one analyzer (positional args)
tj tokenmaxx           # shareable spend-tier callout
tj status              # current cost, tokens, active alerts
tj cost --since 7d     # spend by agent / model / day / tool
tj alerts              # everything that fired while you were away
tj drift               # behavioral drift Z-scores
tj report --reuse      # HTML + Markdown skeleton export for the Reuse analyzer
tj backfill claude-code # ingest historical ~/.claude/projects/ sessions
tj serve               # start Lens + REST API

Full CLI reference →

Documentation

Topic	Where
🪶 Downsize / Cache / Script / Trim deep-dives	docs/optimize/
🔁 Reuse analyzer deep-dive	docs/optimize/reuse.md
Claude Code & Codex integration	docs/claude-code-integration.md
Python SDK reference	docs/python-sdk.md
TypeScript SDK reference	docs/typescript-sdk.md
Framework support (LangChain / CrewAI / etc.)	docs/framework-support.md
Alert channels & rule reference	docs/alerts.md
Backfill from Langfuse / Helicone / OTLP	docs/backfill/
Configuration	docs/configuration.md
Architecture deep-dive	docs/architecture.md
Installation extras (Trim, framework patches)	docs/installation.md
Export to Grafana / Datadog / NDJSON	docs/export.md
NemoClaw sandbox observer	docs/nemoclaw-integration.md
Release notes	GitHub Releases

Roadmap

Shipped in 0.3.x: Downsize · Cache · Script · Trim · Claude Code + Codex onboarding · MCP server · Web UI · Backfill adapters (Langfuse, Helicone, OTLP) · Period comparison · Routing-config export · Read-only policy preview

Shipped in 0.4.x:

TokenJam Lens — local dashboard rebrand: Overview triage front-door, Optimize detail tab, real spend-over-time charts, cross-screen drill-through
Reuse analyzer — fifth analyzer: detects clusters of sessions with repeated planning, exports reviewable skeleton templates you can convert into slash commands or scripts
Daemon DB concurrency — per-thread DuckDB cursors so the Overview's fan-out doesn't block on a single shared connection (v0.4.1)
Cache cost transparency — cache_read + cache_write token columns surfaced in CLI + UI + API (the previously-hidden ~91% cost driver on cache-heavy workloads)

Shipped in 0.5.x:

Lens Visualizations — an Analytics pivot explorer (metric × dimension × chart, presets, CSV), stacked cost-by-model, a cache-savings chart, KPI sparklines, a cost-annotated trace waterfall, and consistent series coloring
Merged Dashboard — the explorer and the triage front-door unified into one default screen, with in-place drill-through from recoverable-waste tiles
First-run polish — backfill fidelity (session-level traces, cache read/write split, honest session counts), plan-tier-aware framing throughout (subscription users see token-share, never raw spend), an onboarding welcome banner + next-steps guidance, and a contribution funnel

Up next (roughly):

Continued Lens polish + per-product visual branding
tj policy add | edit | apply — unified rule surface
tj replay — replay captured sessions against new model versions
TypeScript framework patches (LangChain JS, OpenAI Agents SDK)
Vercel AI SDK & Mastra integrations
Docker image
GitHub Actions for CI drift/cost checks

Contributing

TokenJam is MIT, and contributions are welcome — from a one-line pricing fix to a whole new framework integration. A few easy on-ramps:

🟢 Good first issues → — scoped, newcomer-friendly tasks, ready to pick up.
💸 Model pricing — tokenjam/pricing/models.toml is community-maintained. Fix a rate or add a model in a single PR — no issue needed.
🔌 Framework integrations — provider/framework patches follow one clear pattern (tokenjam/sdk/integrations/anthropic.py is the reference). Open an issue first to align on approach.
🤖 Built with coding agents — TokenJam is built by AI coding agents, and contributing with one is first-class. Claude Code: read CLAUDE.md and run /init to bring your agent up to speed. Codex / other agents: AGENTS.md has the critical rules.

Setup and the full dev workflow are in CONTRIBUTING.md.

If TokenJam saves you tokens, ⭐ star it and 👁 watch for releases — we ship often.

tokenjam.dev · PyPI · npm · Issues

MIT License · Built by Metabuilder Labs

TokenJam was created by Anil Murty — reach him at anil@metabldr.com.

Name		Name	Last commit message	Last commit date
Latest commit History 488 Commits
.github		.github
docs		docs
examples		examples
growth		growth
incidents		incidents
sdk-ts		sdk-ts
tests		tests
tokenjam		tokenjam
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CLAUDE.md		CLAUDE.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
SECURITY.md		SECURITY.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Token Efficiency For AI Agents

Five Analyzers + Lens. One Install.

🪶 Downsize

💾 Cache

📜 Script

✂️ Trim

🔁 Reuse

🔭 Lens

30-second quickstart

Lens — the local dashboard

Beyond optimization

CLI

Documentation

Roadmap

Contributing

About

Uh oh!

Releases 26

Packages

Uh oh!

Contributors

Uh oh!

Languages

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Token Efficiency For AI Agents

Five Analyzers + Lens. One Install.

🪶 Downsize

💾 Cache

📜 Script

✂️ Trim

🔁 Reuse

🔭 Lens

30-second quickstart

Lens — the local dashboard

Beyond optimization

CLI

Documentation

Roadmap

Contributing

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 26

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages