Minsang/TSD-KD_Qwen2.5-1.5B
Minsang/TSD-KD_Qwen2.5-1.5B is a 1.5 billion parameter causal language model developed by Minsang, based on the Qwen2.5 architecture. This model is specifically designed for improved reasoning capabilities through Token-Selective Dual Knowledge Distillation. It is optimized for tasks requiring enhanced reasoning, as detailed in the ICLR 2026 paper "Explain in Your Own Words: Improving Reasoning via Token-Selective Dual Knowledge Distillation", and supports a context length of 32768 tokens.
Loading preview...
Overview
Minsang/TSD-KD_Qwen2.5-1.5B is a 1.5 billion parameter language model built upon the Qwen2.5 architecture. It was developed by Minsang and introduced in the ICLR 2026 paper, "Explain in Your Own Words: Improving Reasoning via Token-Selective Dual Knowledge Distillation". The model's core innovation lies in its application of Token-Selective Dual Knowledge Distillation, a technique aimed at enhancing its reasoning abilities.
Key Capabilities
- Enhanced Reasoning: The model is specifically trained and optimized to improve reasoning performance through a novel knowledge distillation method.
- Qwen2.5 Base: Leverages the robust architecture of Qwen2.5, providing a strong foundation for language understanding and generation.
- Long Context Window: Supports a substantial context length of 32768 tokens, enabling it to process and understand longer inputs.
Good For
- Research in Reasoning: Ideal for researchers exploring advanced reasoning techniques in language models, particularly those interested in knowledge distillation methods.
- Applications Requiring Improved Logic: Suitable for use cases where enhanced logical inference and explanation generation are critical.
- Academic and Experimental Projects: A strong candidate for projects focusing on the practical application of the "Token-Selective Dual Knowledge Distillation" methodology. Further details can be found in the associated research paper.