Name: hyunw3/qwen-2.5-0.5b-r1-countdown_lr5e-6 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: hyunw3

Model Overview

This model, hyunw3/qwen-2.5-0.5b-r1-countdown_lr5e-6, is a fine-tuned iteration of the Qwen2.5-0.5B-Instruct base model. It leverages a 0.5 billion parameter architecture with a substantial 32,768 token context length, making it capable of processing extensive inputs.

Key Capabilities

Enhanced Reasoning: The model was specifically trained using the GRPO (Gradient-based Reward Policy Optimization) method, as introduced in the DeepSeekMath paper, which focuses on improving mathematical reasoning in language models.
Multilingual Support: Inherits multilingual capabilities from its base, supporting languages such as Chinese, English, French, Spanish, German, and more.
Instruction Following: As an instruction-tuned model, it is designed to follow user prompts effectively.

Training Details

The fine-tuning process utilized the TRL (Transformer Reinforcement Learning) library. The application of GRPO suggests an optimization strategy aimed at refining the model's ability to handle complex logical and mathematical problems, distinguishing it from general-purpose instruction-tuned models.

Good For

Applications requiring mathematical problem-solving or logical reasoning.
Use cases where a smaller, efficient model with specialized reasoning capabilities is preferred over larger, more general models.
Scenarios benefiting from a model capable of processing long contexts while maintaining reasoning performance.

Overview

Model Overview

Key Capabilities

Training Details

Good For

Full Model Card (README)