kitft/nla-qwen2.5-7b-L20-av
The kitft/nla-qwen2.5-7b-L20-av model is a 7.6 billion parameter activation verbalizer (AV) fine-tuned from Qwen/Qwen2.5-7B-Instruct. Developed by Kit Fraser-Taliente et al., it is designed as an interpretability tool to map hidden-state vectors to natural-language descriptions. This model is part of a Natural Language Autoencoder (NLA) pair and is not intended for general-purpose language generation, but specifically for decoding LLM activations.
Loading preview...
Natural Language Autoencoder (NLA) Activation Verbalizer
This model, nla-qwen2.5-7b-L20-av, is the activation verbalizer (AV) component of a Natural Language Autoencoder (NLA) pair, developed by Kit Fraser-Taliente and his team. It is a 7.6 billion parameter model fine-tuned from Qwen/Qwen2.5-7B-Instruct with a context length of 32768 tokens. Its primary function is to map a hidden-state vector from a large language model's residual stream (specifically, block 20 in this case) to a natural-language description.
Key Capabilities
- LLM Interpretability: Functions as a core tool for understanding what specific activations within an LLM "mean" by translating them into human-readable text.
- Paired Functionality: Designed to work in conjunction with its counterpart,
kitft/nla-qwen2.5-7b-L20-ar(activation reconstructor), to form a complete NLA system for activation decoding and measurement. - Activation Decoding: Provides a method to "read out" the semantic content of an LLM's internal representations.
Good for
- Researchers in LLM Interpretability: Ideal for those studying the internal workings and representations of large language models.
- Analyzing LLM Activations: Useful for generating natural language explanations for specific neural activations.
- Developing Interpretability Tools: Serves as a foundational component for building more advanced tools to probe and understand LLM behavior.
Note: This model is not a general-purpose language model and is specifically repurposed for activation decoding, making it unsuitable for typical text generation or instruction-following tasks.