gregdlg/qwen-2.5-3b-r1-countdown

TEXT GENERATIONConcurrency Cost:1Model Size:3.1BQuant:BF16Ctx Length:32kPublished:Apr 19, 2026Architecture:Transformer Cold

The gregdlg/qwen-2.5-3b-r1-countdown model is a 3.1 billion parameter language model, fine-tuned from Qwen/Qwen2.5-3B-Instruct. Developed by gregdlg, it utilizes the TRL framework and was trained with GRPO, a method designed to enhance mathematical reasoning. This model is optimized for tasks requiring advanced reasoning capabilities, particularly in mathematical contexts, leveraging its 32768 token context length.

Loading preview...

Model Overview

The gregdlg/qwen-2.5-3b-r1-countdown is a 3.1 billion parameter language model, fine-tuned from the base Qwen/Qwen2.5-3B-Instruct architecture. It was developed by gregdlg and trained using the TRL framework, specifically incorporating the GRPO (Gradient-based Reasoning Policy Optimization) method.

Key Training Details

The model's training procedure leveraged GRPO, a technique introduced in the paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models." This suggests an optimization focus on enhancing the model's ability to handle complex reasoning and mathematical problems. The training utilized specific versions of key frameworks:

  • TRL: 1.2.0
  • Transformers: 4.57.6
  • Pytorch: 2.10.0

Potential Use Cases

Given its fine-tuning with GRPO, this model is likely well-suited for applications requiring:

  • Mathematical problem-solving: Tasks involving arithmetic, algebra, and other quantitative reasoning.
  • Logical deduction: Scenarios where the model needs to follow complex chains of thought.
  • Instruction following: Benefiting from its Instruct base, it can process and respond to detailed user prompts effectively.