Pivotal Token Search
-
Updated
Dec 20, 2025 - Python
Pivotal Token Search
Adversarial Manipulation of CoT
mech-interp suite for Granite4 models that use Mamba-2 architecture
Analysed determinism, faithfulness, reasoning patterns, & steering. Developed and tested methods to enhance control and fail-safes
Official implementation of the 'Uncovering Competency Gaps in Large Language Models and Their Benchmarks' paper
Detect safety degradation during LLM fine-tuning before it becomes behavioral
All code, stimuli, and results for a mechanistic interpretability study investigating how large language models internally represent emotional content
Implementation and analysis of Sparse Autoencoders for neural network interpretability research. Features interactive visualization dashboard and W&B integration.
Collection and learnings of my journey in Artificial Intelligence
Unofficial implementation to reproduce the experiments from "Superposition as a Phase Change" of "Toy Models of Superposition".
Local agent-driven mechanistic interpretability research platform for Apple Silicon
Cross-architecture mechanistic interpretability toolkit — first OSS Mamba SSM state extraction. Works on transformer + SSM + hybrid models with unified API.
Automated Forensic Discovery of Reasoning Circuits in Transformers
Add a description, image, and links to the mech-interp topic page so that developers can more easily learn about it.
To associate your repository with the mech-interp topic, visit your repo's landing page and select "manage topics."