vkasera/v4_qwen-2.5-3b-r1-countdown-phil

TEXT GENERATIONConcurrency Cost:1Model Size:3.1BQuant:BF16Ctx Length:32kPublished:Oct 3, 2025Architecture:Transformer Cold

The vkasera/v4_qwen-2.5-3b-r1-countdown-phil model is a 3.1 billion parameter instruction-tuned language model, fine-tuned from Qwen/Qwen2.5-3B-Instruct. It leverages the GRPO training method, introduced in DeepSeekMath, to enhance its capabilities. With a context length of 32768 tokens, this model is optimized for tasks requiring advanced mathematical reasoning and complex problem-solving. Its fine-tuning focuses on improving performance in areas where precise logical deduction is critical.

Loading preview...

Model Overview

vkasera/v4_qwen-2.5-3b-r1-countdown-phil is a 3.1 billion parameter language model, fine-tuned from the base Qwen/Qwen2.5-3B-Instruct model. It was developed using the TRL library and incorporates a specialized training methodology.

Key Training Details

This model's distinctiveness stems from its training with GRPO (Gradient Regularized Policy Optimization). GRPO is a method highlighted in the research paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models" (arXiv:2402.03300). This suggests an optimization for tasks that benefit from robust mathematical and logical reasoning capabilities.

Technical Specifications

  • Base Model: Qwen2.5-3B-Instruct
  • Parameter Count: 3.1 Billion
  • Context Length: 32768 tokens
  • Training Frameworks: TRL (version 0.23.1), Transformers (version 4.56.2), Pytorch (version 2.7.0)

Potential Use Cases

Given its fine-tuning with GRPO, this model is likely well-suited for applications requiring:

  • Mathematical problem-solving: Tasks involving complex calculations, proofs, or logical deductions.
  • Reasoning-intensive queries: Scenarios where the model needs to follow multi-step logic to arrive at an answer.
  • Instruction-following: Benefiting from its instruction-tuned base, it can handle diverse user prompts effectively.