Name: gregdlg/qwen-2.5-3b-r1-countdown-coloc API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: gregdlg

Model Overview

The gregdlg/qwen-2.5-3b-r1-countdown-coloc is a 3.1 billion parameter language model, fine-tuned from the base Qwen/Qwen2.5-3B-Instruct model. It leverages the TRL (Transformers Reinforcement Learning) framework for its training process.

Key Training Innovation

A significant aspect of this model's development is the integration of GRPO (Gradient Regularized Policy Optimization). This method, detailed in the paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models", is specifically designed to improve a model's mathematical reasoning abilities. By applying GRPO, this Qwen2.5 variant aims to enhance performance on complex reasoning tasks.

Technical Specifications

Base Model: Qwen/Qwen2.5-3B-Instruct
Parameter Count: 3.1 Billion
Context Length: 32768 tokens
Training Frameworks: TRL (version 1.2.0), Transformers (version 4.57.6), PyTorch (version 2.10.0), Datasets (version 4.8.4), Tokenizers (version 0.22.2).

Potential Use Cases

Given its fine-tuning with GRPO, this model is particularly suited for applications requiring:

Mathematical problem-solving: Tasks that involve numerical reasoning, equations, and logical deduction.
Complex reasoning: Scenarios where structured thought processes are crucial.
Instruction-following: Benefiting from its Instruct base, it can handle detailed prompts effectively.

Overview

Model Overview

Key Training Innovation

Technical Specifications

Potential Use Cases

Full Model Card (README)