Name: xzhiying/qwen-2.5-3b-r1-countdown API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: xzhiying

Model Overview

The xzhiying/qwen-2.5-3b-r1-countdown is a 3.1 billion parameter language model, fine-tuned from the base Qwen/Qwen2.5-3B-Instruct architecture. This model was developed by xzhiying and utilizes the TRL framework for its training process.

Key Differentiator: GRPO Training

A significant aspect of this model is its training methodology. It incorporates GRPO (Gradient-based Reward Policy Optimization), a method detailed in the research paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models" (arXiv:2402.03300). This suggests an optimization for tasks requiring strong mathematical reasoning and problem-solving abilities.

Potential Use Cases

Given its fine-tuning and the GRPO training method, this model is likely well-suited for:

Mathematical problem-solving: Tasks involving complex calculations, proofs, or logical deductions.
Reasoning-intensive applications: Scenarios where the model needs to follow intricate instructions and derive conclusions.
Instruction-following tasks: General applications where precise adherence to user prompts is critical.

Technical Details

The model was trained using specific versions of popular frameworks:

TRL: 0.14.0
Transformers: 4.48.1
Pytorch: 2.5.1+cu121
Datasets: 3.1.0
Tokenizers: 0.21.4

This model provides a solid foundation for developers looking for a compact yet capable language model with enhanced reasoning capabilities, particularly in mathematical contexts.

Overview

Model Overview

Key Differentiator: GRPO Training

Potential Use Cases

Technical Details

Full Model Card (README)