Name: vkasera/v4_qwen-2.5-3b-r1-countdown-phil API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: vkasera

Model Overview

vkasera/v4_qwen-2.5-3b-r1-countdown-phil is a 3.1 billion parameter language model, fine-tuned from the base Qwen/Qwen2.5-3B-Instruct model. It was developed using the TRL library and incorporates a specialized training methodology.

Key Training Details

This model's distinctiveness stems from its training with GRPO (Gradient Regularized Policy Optimization). GRPO is a method highlighted in the research paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models" (arXiv:2402.03300). This suggests an optimization for tasks that benefit from robust mathematical and logical reasoning capabilities.

Technical Specifications

Base Model: Qwen2.5-3B-Instruct
Parameter Count: 3.1 Billion
Context Length: 32768 tokens
Training Frameworks: TRL (version 0.23.1), Transformers (version 4.56.2), Pytorch (version 2.7.0)

Potential Use Cases

Given its fine-tuning with GRPO, this model is likely well-suited for applications requiring:

Mathematical problem-solving: Tasks involving complex calculations, proofs, or logical deductions.
Reasoning-intensive queries: Scenarios where the model needs to follow multi-step logic to arrive at an answer.
Instruction-following: Benefiting from its instruction-tuned base, it can handle diverse user prompts effectively.

Overview

Model Overview

Key Training Details

Technical Specifications

Potential Use Cases

Full Model Card (README)