kitft/nla-qwen2.5-7b-L20-av

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:7.6BQuant:FP8Ctx Length:32kPublished:Mar 16, 2026License:apache-2.0Architecture:Transformer0.0K Open Weights Warm

The kitft/nla-qwen2.5-7b-L20-av model is a 7.6 billion parameter activation verbalizer (AV) fine-tuned from Qwen/Qwen2.5-7B-Instruct. Developed by Kit Fraser-Taliente et al., it is designed as an interpretability tool to map hidden-state vectors to natural-language descriptions. This model is part of a Natural Language Autoencoder (NLA) pair and is not intended for general-purpose language generation, but specifically for decoding LLM activations.

Loading preview...

Natural Language Autoencoder (NLA) Activation Verbalizer

This model, nla-qwen2.5-7b-L20-av, is the activation verbalizer (AV) component of a Natural Language Autoencoder (NLA) pair, developed by Kit Fraser-Taliente and his team. It is a 7.6 billion parameter model fine-tuned from Qwen/Qwen2.5-7B-Instruct with a context length of 32768 tokens. Its primary function is to map a hidden-state vector from a large language model's residual stream (specifically, block 20 in this case) to a natural-language description.

Key Capabilities

  • LLM Interpretability: Functions as a core tool for understanding what specific activations within an LLM "mean" by translating them into human-readable text.
  • Paired Functionality: Designed to work in conjunction with its counterpart, kitft/nla-qwen2.5-7b-L20-ar (activation reconstructor), to form a complete NLA system for activation decoding and measurement.
  • Activation Decoding: Provides a method to "read out" the semantic content of an LLM's internal representations.

Good for

  • Researchers in LLM Interpretability: Ideal for those studying the internal workings and representations of large language models.
  • Analyzing LLM Activations: Useful for generating natural language explanations for specific neural activations.
  • Developing Interpretability Tools: Serves as a foundational component for building more advanced tools to probe and understand LLM behavior.

Note: This model is not a general-purpose language model and is specifically repurposed for activation decoding, making it unsuitable for typical text generation or instruction-following tasks.