ceselder/nanonla-l24-av-qwen3-8b

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kTool Calling:SupportedPublished:Jun 3, 2026License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

The ceselder/nanonla-l24-av-qwen3-8b model is an 8 billion parameter Natural Language Autoencoder (NLA) activation-verbalizer (AV) for Qwen3-8B, specifically targeting layer 24. Developed by ceselder, this model translates residual-stream activation vectors into natural language explanations. It was trained using a supervised warm-start stage with a Karvonen norm-matched injection formula, making it specialized for interpreting internal LLM activations.

Loading preview...

nanoNLA Qwen3-8B Activation Verbalizer (Layer 24)

This model, ceselder/nanonla-l24-av-qwen3-8b, is the activation-verbalizer (AV) component of a Natural Language Autoencoder (NLA) for the Qwen3-8B large language model, specifically designed for layer 24. An NLA system maps residual-stream activations to natural language and back. This particular model focuses on the activation → text direction, generating natural language descriptions of what a given activation vector represents.

Key Characteristics & Usage

  • Activation Verbalization: Translates internal LLM activation vectors (from Qwen3-8B's layer 24) into human-readable explanations.
  • Karvonen Injection: Utilizes a specific "Karvonen norm-matched additive injection" formula at a layer-1 residual-stream hook. Proper inference requires serving it with the same injection method, as detailed in the project's documentation.
  • Base Model: Fine-tuned from the Qwen/Qwen3-8B model.
  • Training Stage: This is an AV-SFT (supervised warm-start) checkpoint, trained for 1000 steps on a dataset of explanations over qwen3-8b-nla-L24-finefineweb-100k.
  • Research Focus: Developed as a research checkpoint, it provides plausible but imperfect explanations, as it's the warm-start stage before further reinforcement learning (RL).

Good For

  • LLM Interpretability Research: Ideal for researchers studying the internal workings and interpretability of large language models, particularly Qwen3-8B.
  • Understanding Activation Semantics: Provides a tool to verbalize and understand the semantic content encoded within specific layers of an LLM.
  • Developing NLA Systems: Serves as a foundational component for building and experimenting with Natural Language Autoencoders.