vkasera/v3_qwen-2.5-3b-r1-countdown-phil

TEXT GENERATIONConcurrency Cost:1Model Size:3.1BQuant:BF16Ctx Length:32kPublished:Oct 3, 2025Architecture:Transformer Cold

The vkasera/v3_qwen-2.5-3b-r1-countdown-phil model is a 3.1 billion parameter language model fine-tuned from Qwen/Qwen2.5-3B-Instruct. Developed by vkasera, this model was trained using the GRPO method, as introduced in the DeepSeekMath paper, to enhance its reasoning capabilities. It is optimized for tasks requiring advanced reasoning, building upon the base Qwen2.5 architecture.

Loading preview...

Model Overview

This model, vkasera/v3_qwen-2.5-3b-r1-countdown-phil, is a 3.1 billion parameter language model derived from the Qwen/Qwen2.5-3B-Instruct base. It has been specifically fine-tuned using the GRPO (Generative Reinforcement Learning with Policy Optimization) method, a technique highlighted in the DeepSeekMath paper for its effectiveness in mathematical reasoning tasks.

Key Training Details

  • Base Model: Qwen/Qwen2.5-3B-Instruct
  • Fine-tuning Method: GRPO, implemented via the TRL library.
  • Training Steps: 450 steps with a learning rate of 5.0e-7.
  • Context Length: Supports a maximum prompt length of 256 tokens and a maximum completion length of 1024 tokens during GRPO training.

Potential Use Cases

Given its GRPO-based training, this model is likely to perform well in:

  • Reasoning-intensive tasks: Especially those requiring structured thought processes.
  • Instruction following: Leveraging its Instruct base and fine-tuning.
  • Exploratory applications: For users interested in models trained with advanced reinforcement learning techniques.