ShenLinxi/qwen-2.5-3b-r1-countdown

TEXT GENERATIONConcurrency Cost:1Model Size:3.1BQuant:BF16Ctx Length:32kTool Calling:SupportedPublished:Mar 3, 2025Architecture:Transformer Cold

ShenLinxi/qwen-2.5-3b-r1-countdown is a 3.1 billion parameter language model fine-tuned using the GRPO method, as introduced in the DeepSeekMath paper. This model is specifically optimized for mathematical reasoning tasks, leveraging advanced training techniques to enhance its problem-solving capabilities. It is built upon an unspecified base model and trained with TRL, making it suitable for applications requiring robust mathematical understanding and generation.

Loading preview...

Model Overview

This model, ShenLinxi/qwen-2.5-3b-r1-countdown, is a 3.1 billion parameter language model. It has been fine-tuned using the GRPO (Gradient Regularized Policy Optimization) method, a technique detailed in the research paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models" (arXiv:2402.03300). The training was conducted using the TRL (Transformer Reinforcement Learning) framework.

Key Capabilities

  • Mathematical Reasoning: The primary focus of this model's fine-tuning is to enhance its ability in mathematical reasoning, leveraging the GRPO method for improved performance in this domain.
  • Instruction Following: As an instruction-tuned model, it is designed to respond to user prompts effectively, as demonstrated by the quick start example.

Training Details

  • Methodology: Utilizes the GRPO method, which is associated with advancements in mathematical reasoning for large language models.
  • Frameworks: Trained with TRL (version 0.14.0), Transformers (version 4.49.0), Pytorch (version 2.5.1+cu121), Datasets (version 3.1.0), and Tokenizers (version 0.21.0).

Good For

  • Applications requiring strong mathematical problem-solving.
  • Research and development in enhancing LLMs for complex reasoning tasks.
  • Generating responses to mathematical or logic-based queries.