lhkhiem28/Qwen2.5-3B-ha_grpo

TEXT GENERATIONConcurrency Cost:1Model Size:3.1BQuant:BF16Ctx Length:32kTool Calling:SupportedPublished:Jun 5, 2026Architecture:Transformer Cold

lhkhiem28/Qwen2.5-3B-ha_grpo is a 3.1 billion parameter instruction-tuned causal language model, fine-tuned from Qwen/Qwen2.5-3B-Instruct. It was trained using the HA_GRPO method on the lhkhiem28/HA-GRPO-datasets, a technique specifically designed to enhance mathematical reasoning capabilities. This model is optimized for tasks requiring advanced mathematical problem-solving and logical deduction, offering a 32768 token context length.

Loading preview...

Overview

lhkhiem28/Qwen2.5-3B-ha_grpo is a 3.1 billion parameter language model, fine-tuned from the base Qwen/Qwen2.5-3B-Instruct model. Its primary distinction lies in its specialized training using the HA_GRPO method on the lhkhiem28/HA-GRPO-datasets. This training approach, introduced in the DeepSeekMath paper, focuses on significantly improving the model's mathematical reasoning abilities.

Key Capabilities

  • Enhanced Mathematical Reasoning: Specifically optimized for complex mathematical problem-solving and logical deduction, leveraging the HA_GRPO training methodology.
  • Instruction Following: Retains the instruction-following capabilities of its base Qwen2.5-3B-Instruct model.
  • Context Length: Supports a substantial context window of 32768 tokens, beneficial for multi-step reasoning tasks.

Good For

  • Applications requiring strong mathematical problem-solving.
  • Tasks involving logical reasoning and complex calculations.
  • Developers looking for a compact model with specialized mathematical prowess, building upon the Qwen2.5 architecture.