mech-interp

All code, stimuli, and results for a mechanistic interpretability study investigating how large language models internally represent emotional content

ai-psychology mech-interp

Updated Mar 17, 2026
Python

ashioyajotham / exploring_saes

Star

Implementation and analysis of Sparse Autoencoders for neural network interpretability research. Features interactive visualization dashboard and W&B integration.

sparse-autoencoders interpretability activation-functions neuron-activity wandb transformerlens mech-interp

Updated Nov 21, 2025
Python

ashioyajotham / ai_research

Star

Collection and learnings of my journey in Artificial Intelligence

reinforcement-learning alignment ai-safety reasoning red-teaming evals mech-interp

Updated Jun 16, 2026
Jupyter Notebook

coderinblack08 / prompt-helmet

Star

ai cybersecurity mech-interp

Updated Apr 6, 2025
Jupyter Notebook

ymgw55 / repro-superposition

Star

Unofficial implementation to reproduce the experiments from "Superposition as a Phase Change" of "Toy Models of Superposition".

python neural-network reproducible-research circuit interpretability llm anthropic mech-interp

Updated Aug 17, 2025
Jupyter Notebook

ashlrai / mechanistic-interpretability

Star

Local agent-driven mechanistic interpretability research platform for Apple Silicon

sparse-autoencoders ai-safety acdc interpretability apple-silicon mechanistic-interpretability activation-patching abliteration mech-interp transformer-lens

Updated May 28, 2026
Python

OriginalKazdov / archscope

Star

Cross-architecture mechanistic interpretability toolkit — first OSS Mamba SSM state extraction. Works on transformer + SSM + hybrid models with unified API.

pytorch transformer probes ssm mamba sparse-autoencoders ai-safety interpretability state-space-models cross-architecture llm mechanistic-interpretability activation-patching mech-interp tuned-lens

Updated May 15, 2026
Python

sagnikc395 / circuit-surgeon

Star

Automated Forensic Discovery of Reasoning Circuits in Transformers

pytorch llms mech-interp transformer-lens

Updated Apr 28, 2026
Python

Improve this page

Add a description, image, and links to the mech-interp topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the mech-interp topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mech-interp

Here are 14 public repositories matching this topic...

codelion / pts

kureha-yamaguchi / reasoning-manipulation

humanjesse / Granite4-Mamba2-Mech-Interp-Suite

1289nav / Exploring-chain-of-thought-reasoning-in-LLMs

maty-bohacek / competency-gaps

Ayesha-Imr / safety-compass

keidolabs / affect-reception

ashioyajotham / exploring_saes

ashioyajotham / ai_research

coderinblack08 / prompt-helmet

ymgw55 / repro-superposition

ashlrai / mechanistic-interpretability

OriginalKazdov / archscope

sagnikc395 / circuit-surgeon

Improve this page

Add this topic to your repo