Name: ShenLinxi/qwen-2.5-3b-r1-countdown API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: ShenLinxi

Model Overview

This model, ShenLinxi/qwen-2.5-3b-r1-countdown, is a 3.1 billion parameter language model. It has been fine-tuned using the GRPO (Gradient Regularized Policy Optimization) method, a technique detailed in the research paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models" (arXiv:2402.03300). The training was conducted using the TRL (Transformer Reinforcement Learning) framework.

Key Capabilities

Mathematical Reasoning: The primary focus of this model's fine-tuning is to enhance its ability in mathematical reasoning, leveraging the GRPO method for improved performance in this domain.
Instruction Following: As an instruction-tuned model, it is designed to respond to user prompts effectively, as demonstrated by the quick start example.

Training Details

Methodology: Utilizes the GRPO method, which is associated with advancements in mathematical reasoning for large language models.
Frameworks: Trained with TRL (version 0.14.0), Transformers (version 4.49.0), Pytorch (version 2.5.1+cu121), Datasets (version 3.1.0), and Tokenizers (version 0.21.0).

Good For

Applications requiring strong mathematical problem-solving.
Research and development in enhancing LLMs for complex reasoning tasks.
Generating responses to mathematical or logic-based queries.

Overview

Model Overview

Key Capabilities

Training Details

Good For

Full Model Card (README)