Spec-driven Claude Code workflow with a verification pipeline. Worker → verifier → fix-agent retry, by default.
dwarves-kit drives a Claude Code project through one spec-driven lifecycle: think → spec → execute → review → ship → retro. Every build task runs a verification pipeline (worker → verifier → fix-agent retry), and hooks enforce safety automatically (rm -rf, push-to-main, force-push, and secret-file reads are blocked).
You drive it by intent, not by memorizing commands. Say what you want; the kit reads your intent, runs the right step, and stops only at the real decisions:
You: "add a --version flag to the CLI"
Kit: scopes it -> writes the spec -> builds + verifies -> ships,
pausing only where it needs your call.
The /kit:* commands below are those same actions named explicitly, for when you prefer to type them. You rarely need to.
It is bash-first (every hook readable in 30 seconds), and every component traces to a proven pattern, no novel inventions. The point of the kit is the handoff: a solo technical lead writes the spec, a contractor runs /kit:execute against the same spec.
New here? Install below, then run your first cycle. The full operator reference (every command, hook, and agent, plus troubleshooting) lives in MANUAL.md.
In any Claude Code session:
/plugin marketplace add dwarvesf/dwarves-kit
/plugin install kit@dwarves-marketplace
That's it. Hooks, commands, agents, and the skill all install automatically. No bash, no jq, no symlinks. Updates via /plugin update kit.
To get the kit listed on Anthropic's official marketplace (claude-plugins-official), submit it via claude.ai/settings/plugins/submit. One-time manual step; not blocking the self-hosted install above.
For environments without Claude Code's plugin system (CI, project templates, older Claude Code versions):
git clone https://github.com/dwarvesf/dwarves-kit.git ~/.claude/dwarves-kit
cd ~/.claude/dwarves-kit && bash install.shRequires jq (for settings merge) and git. Cloning in place is simplest, but install.sh also runs from a checkout anywhere. To uninstall: bash ~/.claude/dwarves-kit/install.sh --uninstall.
Don't run both paths on the same machine, hooks would register twice. The plugin install does not configure statusLine (not in the v1 plugin schema); use the bash install if you want the statusline HUD.
Invocation differs by path: installed as the plugin, commands are namespaced /kit:<name> (e.g. /kit:spec); via the bash installer they resolve bare /<name> (e.g. /spec). This README uses the plugin form.
After install, open a Claude Code session in your project and run one full lap. A tiny change is the best first try:
/kit:startorients you and suggests the next step./kit:thinkand describe the change (e.g. "add a--versionflag to the CLI"). It throws 6 forcing questions at the idea; answer them./kit:specwrites the contract todocs/specs/SPEC-NNN-<slug>.md./kit:executeruns the autonomous build: a worker implements the spec, a verifier checks it against the acceptance criteria, a fix-agent retries fixable failures (max 2)./kit:reviewthen/kit:ship: review gate, then version bump, changelog, conventional commit, PR.
That is the whole loop. The spec is the unit of handoff: a contractor running /kit:execute reads the same docs/specs/SPEC-NNN-<slug>.md you wrote. To see the artifact set without running anything, browse examples/hello-spec/.
/kit:start Detect state, suggest next command (entry point)
/kit:think Challenge the idea (5 min)
/kit:design Opt-in: shape the solution with you before /spec
/kit:spec Generate the spec + 4 parallel researchers (15-30 min)
/kit:spec-validate Stress-test the spec (10 min)
[hand off to contractor]
/kit:execute Autonomous: worker > verifier > fix-agent retry loop
or
/kit:next Manual: pick next task, load context, you drive
[hooks enforce during build]
[statusline shows context budget]
[session-state-save persists progress on every stop]
[slop-cleaner flags bloat at stop points]
/kit:review Single-pass review (10 min)
/kit:review-team Parallel 3-lens review (security + arch + tests)
/kit:verify Re-run the tests read-only, no rebuild (on demand)
/kit:docs Update all docs to match code (5 min)
/kit:ship Review gate, version bump, changelog, commit, PR
/kit:retro Retrospective (10 min, after shipping)
Work is sized by risk lane before it starts (tiny / normal / full / bug, plus a backfill lane for reviewing an existing codebase and writing the operating-layer docs without changing behavior). The lanes, the gate at each phase boundary, and the operate-contract the agent follows live in AGENTS.md and WORKFLOW.md.
Every task goes through: worker > task-verifier > fix-agent (if needed).
worker subagent completes task
-> task-verifier (read-only) checks acceptance criteria + tests
-> PASS: mark done, continue
-> FAIL:fixable: dispatch fix-agent (max 2 retries)
-> FAIL:escalate: stop, ask human
Within one spec, tasks run sequentially. Across specs, /kit:dispatch fans out disjoint VALIDATED specs into parallel git worktrees behind a disjointness gate; across sessions, a passive goal registry (lib/goal-registry.sh) keeps concurrent same-machine sessions from colliding. The kit deliberately stops short of a DAG scheduler, a coordinating daemon, or cross-machine orchestration. For those, run GSD v2 or Nimbalyst alongside it.
Hooks (14, automatic, event-triggered)
| Hook | Event | What it does |
|---|---|---|
| safety-gate | PreToolUse(Bash) | Blocks rm -rf (build-artifact allowlist), push to main, force push, DROP TABLE, git reset --hard, kubectl delete |
| secrets-guard | PreToolUse(Read|Edit|Bash) | Blocks reads of secret files (.env, ~/.ssh, ~/.aws, .pem); canonicalizes the path first |
| commit-format | PreToolUse(Bash) | Blocks non-conventional / >72-char / spec-ID commit subjects |
| context-readiness | SessionStart | Detects project state, suggests next command |
| anti-rationalization | Stop | Catches Claude declaring work done prematurely |
| slop-cleaner | Stop | Flags bloated code in recently modified files |
| session-state-save | Stop, SubagentStop | Persists session state, rotates last 10 archives |
| auto-format | PostToolUse(Write|Edit) | Runs formatter on every file change |
| spec-drift-guard | PreToolUse(Write) | Warns when creating files not in the spec |
| pre-compact-backup | PreCompact | Saves structured session snapshot before compaction |
| post-compact-reinject | PostToolUse(compact) | Re-injects critical rules after compaction |
| notification | Notification | Desktop alert when Claude needs input |
| permission-auto-approve | PermissionRequest | Auto-approves read-only operations (pipe-safe) |
| statusline | StatusLine | Shows model, branch, context %, cost, thinking mode |
Commands (22, manual, human-triggered)
| Command | Phase | What it does |
|---|---|---|
| /kit:start | Entry | Detect project state, suggest next command |
| /kit:think | Think | 6 forcing questions to stress-test an idea |
| /kit:design | Design | Opt-in: interactive solution-design beat (one question at a time) before /spec |
| /kit:devs-team | Design | Opt-in: 5-lens parallel critique of the solution (brief or spec), report-only |
| /kit:visual-team | Design | Opt-in: 5-lens parallel critique of a visual/UI design (downstream-facing) |
| /kit:ui-design | Design | Opt-in, downstream: UI brief -> generate (frontend-design) -> critique -> revise loop |
| /kit:assign | Orchestrate | Turn a backlog item (ID-NNN) into a scoped goal draft + route it into the lane |
| /kit:dispatch | Orchestrate | Fire N disjoint VALIDATED specs concurrently, each in its own worktree, behind a disjointness gate; lead-owned merge |
| /kit:spec | Spec | Generate docs/specs/SPEC-NNN-.md with 4 parallel research agents |
| /kit:spec-validate | Spec | 5 adversarial reviewers attack the spec (incl. solution-design + extensibility) |
| /kit:test-plan | Spec | Opt-in: coverage matrix from acceptance criteria into the spec's ## Test plan section |
| /kit:execute | Build | Autonomous: worker > verifier > fix-agent retry loop |
| /kit:next | Build | Lightweight: picks next undone task, loads context, you drive |
| /kit:verify | Verify | Read-only re-run of task-verifier + integration-checker on the active spec, no rebuild |
| /kit:debug | Bug (off-cycle) | Systematic debug loop: root cause before any fix, evidence ledger, 3-fix wall |
| /kit:review | Review | Paranoid single-pass code review |
| /kit:review-team | Review | Parallel 3-lens review (security + architecture + test-coverage) |
| /kit:docs | Docs | Cross-reference diff against all doc files, fix drift |
| /kit:ship | Ship | Review gate, version bump, changelog, commit, PR |
| /kit:retro | Reflect | What worked, what hurt, action items for next cycle |
| /kit:kit-health | Meta | Self-assessment against kit philosophy |
| /kit:absorb | Meta | Maintainer-only: audit upstream sources (Credits drift + seed-rescan) + draft a dated absorption proposal |
Agents (11, dispatched by commands) and Skill (1, Claude-triggered)
| Agent | Dispatched by | What it does |
|---|---|---|
| task-verifier | /execute | Read-only verification against spec + tests |
| integration-checker | /execute, /verify | Read-only cross-task wiring + global acceptance check (multi-task specs) |
| fix-agent | /execute | Targeted fixes on FAIL:fixable (max 2 retries) |
| reviewer | /review-team | Focused review with configurable lens |
| security-auditor | /review-team | Deep OWASP-style security audit |
| responding-to-review | /review-team | Verifies review findings, pushes back when wrong, proposes fixes (no performative agreement) |
| doc-verifier | /docs | Read-only check that docs match the live codebase |
| research-stack | /spec | Maps technology stack (brownfield) |
| research-features | /spec | Maps existing features in target area |
| research-architecture | /spec | Maps architecture patterns and conventions |
| research-pitfalls | /spec | Finds landmines before implementation |
| Skill | What it does |
|---|---|
| get-api-docs | Fetches curated API docs via Context Hub before coding |
A solo technical lead handing off implementation to contractors. The kit covers the full lifecycle with one shared spec format: the contractor running /kit:execute reads the same docs/specs/SPEC-NNN-<slug>.md you wrote with /kit:spec.
Also for a builder using Claude Code 6-8 hours a day who wants a context-budget HUD, automatic safety guards, session-state persistence across compaction, and slop detection at stop points.
- Teams of 10+ with a dedicated DevOps pipeline. The kit targets one engineer (or one lead + delegated contractors); one lead can still fan out parallel workers (
/kit:dispatch) and run concurrent same-machine sessions safely. Cross-machine orchestration, 3+ live operators, or goal-ordering chains are out of scope, pair the kit with Nimbalyst or Conductor for that. - Anyone who wants a UI. The kit is bash hooks + markdown commands. Open any file in a text editor; it's all readable.
- Projects already happy with GSD, gstack, or Trail of Bits' configs as standalone tools. The kit's value is integration; if format-translation overhead between standalone tools isn't hurting you, don't switch.
Directory layout
dwarves-kit/
tool.toml Kit metadata (name, version, language=bash, deps)
AGENTS.md Tool-agnostic operate-contract front door (any runtime reads it first)
WORKFLOW.md The cycle, the risk-tier lanes, the gates, and the flow/loop reference (ASCII diagrams)
MANUAL.md Operator reference: commands, hooks, agents, natural-language scenarios, troubleshooting
README.md / CONTRIBUTING.md / CHANGELOG.md / VERSION / LICENSE
CLAUDE.md Project template; the Claude-Code layer on top of AGENTS.md
install.sh / settings.json Bash install path
.claude-plugin/ Plugin install path (plugin.json, marketplace.json)
.github/workflows/test.yml CI: macOS + Ubuntu test matrix
agents/ (11 files) Subagents dispatched by commands
commands/ (22 markdown command prompts)
hooks/ (14 scripts + hooks.json plugin manifest)
lib/dispatch-gate.sh Disjointness gate + drift guard for /kit:dispatch (pure-bash concurrency moat)
lib/lane-classify.sh Deterministic task-type -> risk-lane classifier (used by /kit:assign + /kit:dispatch)
lib/goal-registry.sh Cross-session running-goal registry: claim/list/log/release (multi-session moat + monitor)
lib/goal-drafts.sh Goal-draft lifecycle: archive shipped drafts to .claude/goals/done/
skills/get-api-docs/ Context Hub integration
rules/ Path-scoped coding-standard templates
examples/hello-spec/ Demo: small CLAUDE.md + SPEC.md walkthrough
tests/test-hooks.sh Hook behavior assertions
tests/test-meta.sh Structural integrity (manifests, frontmatter, cross-links)
docs/specs/SPEC-NNN-<slug>.md Specs, tracked in place via Status header (DRAFT/VALIDATED/SHIPPED); hooks pick the active one by git branch
docs/ The kit's design record (not needed to USE the kit; see docs/README.md)
README.md Map of docs/: what each file is, and that you can skip it to use the kit
PHILOSOPHY.md Design principles, target user, rejection list
architecture.md Components, data flow, the SDLC state machine, Collaborative Design Protocol, deps
decisions/ One ADR per file (NNNN-<slug>.md); supersession recorded in the Status line
specs/ Specs (SPEC-NNN-<slug>.md); also the live spec store the hooks detect
retro/ Per-cycle retrospectives (output of /kit:retro)
research/ Dated deep-scans that fed specific specs
_meta/BACKLOG.md Phased task backlog
For the full file listing including individual agent/hook/command names, run git ls-files or browse the repo on GitHub.
Debug mode. Set DWARVES_KIT_DEBUG=1 and every hook logs its decisions to stderr. Useful when a hook misbehaves or you want to understand why something was blocked or approved.
Hook logs. Hooks that make enforcement decisions append to ~/.claude/dwarves-kit/logs/ (anti-rationalization.log, safety-gate.log, spec-drift-guard.log, slop-cleaner.log). These build the eval corpus for future optimization.
Testing. bash tests/test-hooks.sh covers hook behavior (safety-gate blocking, anti-rationalization patterns, permission-auto-approve pipe-injection protection); bash tests/test-meta.sh covers structural integrity (manifests, frontmatter, cross-links).
External dependencies (install alongside, not bundled):
- Context Hub -
npm install -g @aisuite/chub - Context7 - MCP server for library docs
- codebase-memory-mcp - AST-level codebase indexing
- Prompt-type anti-rationalization hook (Haiku evaluation instead of grep patterns)
- /qa command with headless browser testing (requires Playwright)
- Intra-spec parallel task dispatch in /execute (cross-goal fan-out across specs already ships as /kit:dispatch; this is the deferred intra-spec case)
- SessionEnd hook for automatic knowledge capture
- Multi-harness packaging (Codex / Cursor / Gemini / OpenCode), deferred until real demand
See CHANGELOG.md. It's the source of truth; the README does not duplicate it.
Patterns extracted from:
- GSD v1 / get-shit-done - spec generation, the original planning-dir convention (since unified onto docs/specs/), 4 parallel researchers. Distinct from GSD v2 (gsd-build/gsd-2, npm
gsd-pi), a separate standalone agent on the Pi SDK referenced as an external execution runtime, not a pattern source - gstack - /office-hours, /review, /ship patterns; the /kit:ui-design loop shapes (brief schema, injection-wrap, accumulated-feedback)
- frontend-design - the external UI generator /kit:ui-design delegates to; its aesthetic-direction brief shape
- ui-ux-pro-max-skill - /kit:ui-design brief sub-shapes (token ladder, states matrix, a11y bars, voice); generator + tooling rejected per bash-over-binaries
- Trail of Bits - hook implementations, code quality rules, statusline pattern
- ClaudeKit - validation gate, adversarial review, session-state pattern
- Context Hub - API docs skill
- oh-my-claudecode - HUD/statusline, slop-cleaner pattern
- Claude-Code-Game-Studios - /start router, path-scoped rules, Collaborative Design Protocol
- Smart Ralph - fix-agent retry pattern (fail-fix-re-verify loop)
MIT