ceselder/nanonla-l24-av-qwen3-8b
The ceselder/nanonla-l24-av-qwen3-8b model is an 8 billion parameter Natural Language Autoencoder (NLA) activation-verbalizer (AV) for Qwen3-8B, specifically targeting layer 24. Developed by ceselder, this model translates residual-stream activation vectors into natural language explanations. It was trained using a supervised warm-start stage with a Karvonen norm-matched injection formula, making it specialized for interpreting internal LLM activations.
Loading preview...
nanoNLA Qwen3-8B Activation Verbalizer (Layer 24)
This model, ceselder/nanonla-l24-av-qwen3-8b, is the activation-verbalizer (AV) component of a Natural Language Autoencoder (NLA) for the Qwen3-8B large language model, specifically designed for layer 24. An NLA system maps residual-stream activations to natural language and back. This particular model focuses on the activation → text direction, generating natural language descriptions of what a given activation vector represents.
Key Characteristics & Usage
- Activation Verbalization: Translates internal LLM activation vectors (from Qwen3-8B's layer 24) into human-readable explanations.
- Karvonen Injection: Utilizes a specific "Karvonen norm-matched additive injection" formula at a layer-1 residual-stream hook. Proper inference requires serving it with the same injection method, as detailed in the project's documentation.
- Base Model: Fine-tuned from the Qwen/Qwen3-8B model.
- Training Stage: This is an AV-SFT (supervised warm-start) checkpoint, trained for 1000 steps on a dataset of explanations over
qwen3-8b-nla-L24-finefineweb-100k. - Research Focus: Developed as a research checkpoint, it provides plausible but imperfect explanations, as it's the warm-start stage before further reinforcement learning (RL).
Good For
- LLM Interpretability Research: Ideal for researchers studying the internal workings and interpretability of large language models, particularly Qwen3-8B.
- Understanding Activation Semantics: Provides a tool to verbalize and understand the semantic content encoded within specific layers of an LLM.
- Developing NLA Systems: Serves as a foundational component for building and experimenting with Natural Language Autoencoders.