Natural Language Autoencoder (NLA) Activation Verbalizer

This model, nla-qwen2.5-7b-L20-av, is the activation verbalizer (AV) component of a Natural Language Autoencoder (NLA) pair, developed by Kit Fraser-Taliente and his team. It is a 7.6 billion parameter model fine-tuned from Qwen/Qwen2.5-7B-Instruct with a context length of 32768 tokens. Its primary function is to map a hidden-state vector from a large language model's residual stream (specifically, block 20 in this case) to a natural-language description.

Key Capabilities

LLM Interpretability: Functions as a core tool for understanding what specific activations within an LLM "mean" by translating them into human-readable text.
Paired Functionality: Designed to work in conjunction with its counterpart, kitft/nla-qwen2.5-7b-L20-ar (activation reconstructor), to form a complete NLA system for activation decoding and measurement.
Activation Decoding: Provides a method to "read out" the semantic content of an LLM's internal representations.

Good for

Researchers in LLM Interpretability: Ideal for those studying the internal workings and representations of large language models.
Analyzing LLM Activations: Useful for generating natural language explanations for specific neural activations.
Developing Interpretability Tools: Serves as a foundational component for building more advanced tools to probe and understand LLM behavior.

Note: This model is not a general-purpose language model and is specifically repurposed for activation decoding, making it unsuitable for typical text generation or instruction-following tasks.

Overview

Natural Language Autoencoder (NLA) Activation Verbalizer

Key Capabilities

Good for

Full Model Card (README)