Elliott/Qwen2.5-Math-7B-16k-think

Warm
Public
7.6B
FP8
131072
License: mit
Hugging Face
Overview

Model Overview

Elliott/Qwen2.5-Math-7B-16k-think is a specialized language model derived from the Qwen2.5-Math-7B base, developed by Jianhao Yan and his team as part of the LUFFY project. This model is designed to enhance reasoning capabilities, particularly in mathematical contexts, as detailed in their research paper, "Learning to Reason under Off-Policy Guidance" (arXiv:2504.14945).

Key Enhancements

  • Extended Context Window: The model's context window has been significantly expanded to 131,072 tokens (16k), achieved by adjusting the rope_theta parameter from 10000 to 40000.
  • Reasoning Optimization: It incorporates a modified chat template that includes a <think> token, intended to facilitate improved reasoning processes.
  • Base Model: Built upon the robust Qwen2.5-Math-7B architecture, known for its mathematical proficiency.

Ideal Use Cases

  • Mathematical Problem Solving: Suited for applications requiring advanced mathematical reasoning and computation.
  • Complex Reasoning Tasks: Beneficial for scenarios where a longer context and structured thinking process (via the <think> token) can improve output quality.
  • Research in Reasoning: A valuable tool for researchers exploring off-policy guidance and reasoning in large language models.