Name: gregdlg/qwen-2.5-3b-r1-countdown API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: gregdlg

Model Overview

The gregdlg/qwen-2.5-3b-r1-countdown is a 3.1 billion parameter language model, fine-tuned from the base Qwen/Qwen2.5-3B-Instruct architecture. It was developed by gregdlg and trained using the TRL framework, specifically incorporating the GRPO (Gradient-based Reasoning Policy Optimization) method.

Key Training Details

The model's training procedure leveraged GRPO, a technique introduced in the paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models." This suggests an optimization focus on enhancing the model's ability to handle complex reasoning and mathematical problems. The training utilized specific versions of key frameworks:

TRL: 1.2.0
Transformers: 4.57.6
Pytorch: 2.10.0

Potential Use Cases

Given its fine-tuning with GRPO, this model is likely well-suited for applications requiring:

Mathematical problem-solving: Tasks involving arithmetic, algebra, and other quantitative reasoning.
Logical deduction: Scenarios where the model needs to follow complex chains of thought.
Instruction following: Benefiting from its Instruct base, it can process and respond to detailed user prompts effectively.

Overview

Model Overview

Key Training Details

Potential Use Cases

Full Model Card (README)