Name: philschmid/qwen-2.5-3b-r1-countdown API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: philschmid

Model Overview

philschmid/qwen-2.5-3b-r1-countdown is a specialized 3.1 billion parameter language model, fine-tuned from the Qwen/Qwen2.5-3B-Instruct base model. Its primary focus is on mathematical reasoning, specifically designed to solve problems similar to the Countdown game, where the goal is to use a given set of numbers and basic arithmetic operations to reach a target number.

Key Capabilities

Mathematical Reasoning: Demonstrates proficiency in solving arithmetic puzzles, showing step-by-step thought processes.
GRPO Training: Utilizes the GRPO (Guided Reinforcement Learning with Policy Optimization) method, as introduced in the DeepSeekMath paper, to enhance its reasoning abilities.
Instruction Following: Capable of following detailed instructions for problem-solving, including structured output formats like <think> and <answer> tags.

Training Details

This model was trained using the TRL library and the GRPO technique on a dataset derived from the Countdown game. The training procedure aims to replicate the "aha" moment of mathematical discovery, as detailed in a blog post by the model's creator. It leverages a 32768 token context length, allowing for complex reasoning chains.

Overview

Model Overview

Key Capabilities

Training Details

Full Model Card (README)