vkasera/v2_qwen-2.5-1.5b-r1-countdown-phil

TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:Oct 5, 2025Architecture:Transformer Cold

The vkasera/v2_qwen-2.5-1.5b-r1-countdown-phil model is a 1.5 billion parameter language model, fine-tuned from Qwen/Qwen2.5-1.5B-Instruct. It was trained using the GRPO method, which is designed to enhance mathematical reasoning capabilities. With a context length of 32768 tokens, this model is optimized for tasks requiring robust logical and mathematical processing.

Loading preview...

Model Overview

vkasera/v2_qwen-2.5-1.5b-r1-countdown-phil is a 1.5 billion parameter language model, building upon the Qwen/Qwen2.5-1.5B-Instruct architecture. It features a substantial context length of 32768 tokens, making it suitable for processing longer inputs and maintaining conversational coherence over extended interactions.

Key Training & Capabilities

This model was fine-tuned using the GRPO (Gradient-based Reward Policy Optimization) method, as introduced in the research paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models." This specialized training approach suggests an emphasis on:

  • Enhanced Mathematical Reasoning: The GRPO method is specifically designed to improve a model's ability to handle complex mathematical problems and logical deductions.
  • Instruction Following: As it's fine-tuned from an "Instruct" model, it retains strong capabilities in understanding and executing user instructions.

Recommended Use Cases

Given its training methodology, this model is particularly well-suited for applications requiring:

  • Mathematical Problem Solving: Tasks involving arithmetic, algebra, geometry, or other mathematical concepts.
  • Logical Reasoning: Scenarios where the model needs to follow logical steps or deduce conclusions.
  • Complex Instruction Following: Handling detailed and multi-step instructions, especially those with a logical component.