lhkhiem28/Qwen2.5-3B-ha_grpo
lhkhiem28/Qwen2.5-3B-ha_grpo is a 3.1 billion parameter instruction-tuned causal language model, fine-tuned from Qwen/Qwen2.5-3B-Instruct. It was trained using the HA_GRPO method on the lhkhiem28/HA-GRPO-datasets, a technique specifically designed to enhance mathematical reasoning capabilities. This model is optimized for tasks requiring advanced mathematical problem-solving and logical deduction, offering a 32768 token context length.
Loading preview...
Overview
lhkhiem28/Qwen2.5-3B-ha_grpo is a 3.1 billion parameter language model, fine-tuned from the base Qwen/Qwen2.5-3B-Instruct model. Its primary distinction lies in its specialized training using the HA_GRPO method on the lhkhiem28/HA-GRPO-datasets. This training approach, introduced in the DeepSeekMath paper, focuses on significantly improving the model's mathematical reasoning abilities.
Key Capabilities
- Enhanced Mathematical Reasoning: Specifically optimized for complex mathematical problem-solving and logical deduction, leveraging the HA_GRPO training methodology.
- Instruction Following: Retains the instruction-following capabilities of its base Qwen2.5-3B-Instruct model.
- Context Length: Supports a substantial context window of 32768 tokens, beneficial for multi-step reasoning tasks.
Good For
- Applications requiring strong mathematical problem-solving.
- Tasks involving logical reasoning and complex calculations.
- Developers looking for a compact model with specialized mathematical prowess, building upon the Qwen2.5 architecture.