A library for making RepE control vectors
-
Updated
Sep 24, 2025 - Jupyter Notebook
A library for making RepE control vectors
A complete end-to-end pipeline for LLM interpretability with sparse autoencoders (SAEs) using Llama 3.2, written in pure PyTorch and fully reproducible.
This repository collects all relevant resources about interpretability in LLMs
Implementation of the stacked denoising autoencoder in Tensorflow
Pivotal Token Search
Official Code for Paper: Beyond Matryoshka: Revisiting Sparse Coding for Adaptive Representation
Pytorch implementations of various types of autoencoders
Evaluate interpretability methods on localizing and disentangling concepts in LLMs.
SANSA - sparse EASE for millions of items
Sparse Embedding Compression for Scalable Retrieval in Recommender Systems
Multi-Layer Sparse Autoencoders (ICLR 2025)
Tensorflow Examples
Official Triton kernels for TopK and HierarchicalTopK Sparse Autoencoder decoders.
Sparse Auto Encoder and regular MNIST classification with mini batch's
Explore visualization tools for understanding Transformer-based large language models (LLMs)
Experiments with Adversarial Autoencoders using Keras
Sparse Autoencoders (SAE) vs CLIP fine-tuning fun.
A tiny easily hackable implementation of a feature dashboard.
[NeurIPS 2025] This is the official repository for VL-SAE: Interpreting and Enhancing Vision-Language Alignment with a Unified Concept Set
Real-time 3D visualisation of SAE feature activations inside GPT-2, token by token
Add a description, image, and links to the sparse-autoencoder topic page so that developers can more easily learn about it.
To associate your repository with the sparse-autoencoder topic, visit your repo's landing page and select "manage topics."