AnhLD2610/Qwen2.5-7B-Instruct-latent-thought
AnhLD2610/Qwen2.5-7B-Instruct-latent-thought is a 7.6 billion parameter instruction-tuned causal language model, based on Qwen/Qwen2.5-7B-Instruct, with an extended tokenizer and a 131072 token context length. This model uniquely incorporates a `` special token, designed to support and enhance latent reasoning capabilities. It is specifically modified to facilitate research and applications requiring explicit handling of internal thought processes within the model's generation. The added token increases the vocabulary size by one, with its embedding matrix resized accordingly.
Loading preview...
AnhLD2610/Qwen2.5-7B-Instruct-latent-thought Overview
This model is a specialized variant of the Qwen/Qwen2.5-7B-Instruct base model, developed by AnhLD2610. It features a 7.6 billion parameter architecture and maintains the original model's extensive 131072 token context length. The primary modification involves an extended tokenizer to introduce a unique capability.
Key Modifications & Capabilities
- Added Special Token: A new token,
<|latent_thought|>, has been integrated into the vocabulary. This token is assigned ID151665. - Enhanced Reasoning Support: The
<|latent_thought|>token is specifically designed to support and potentially enable latent reasoning capabilities within the model's output generation. - Vocabulary Expansion: The model's vocabulary size has been increased by one token, and its embedding matrix has been resized to accommodate this addition.
Use Cases
This model is particularly suited for:
- Research into Latent Reasoning: Exploring and developing methods for models to explicitly represent or utilize internal thought processes.
- Advanced Instruction Following: Potentially improving the model's ability to follow complex instructions by allowing for an internal 'thought' step.
- Experimental AI Applications: Developing applications that benefit from or require the explicit handling of a model's reasoning path.