Name: vkasera/v3_qwen-2.5-3b-r1-countdown-phil API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: vkasera

Model Overview

This model, vkasera/v3_qwen-2.5-3b-r1-countdown-phil, is a 3.1 billion parameter language model derived from the Qwen/Qwen2.5-3B-Instruct base. It has been specifically fine-tuned using the GRPO (Generative Reinforcement Learning with Policy Optimization) method, a technique highlighted in the DeepSeekMath paper for its effectiveness in mathematical reasoning tasks.

Key Training Details

Base Model: Qwen/Qwen2.5-3B-Instruct
Fine-tuning Method: GRPO, implemented via the TRL library.
Training Steps: 450 steps with a learning rate of 5.0e-7.
Context Length: Supports a maximum prompt length of 256 tokens and a maximum completion length of 1024 tokens during GRPO training.

Potential Use Cases

Given its GRPO-based training, this model is likely to perform well in:

Reasoning-intensive tasks: Especially those requiring structured thought processes.
Instruction following: Leveraging its Instruct base and fine-tuning.
Exploratory applications: For users interested in models trained with advanced reinforcement learning techniques.

Overview

Model Overview

Key Training Details

Potential Use Cases

Full Model Card (README)