Name: Kazuki1450/Qwen2.5-1.5B-Instruct_csum_6_10_1p0_0p5_1p0_grpo_42_rule API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: Kazuki1450

Overview

This model, developed by Kazuki1450, is a fine-tuned version of the Qwen/Qwen2.5-1.5B-Instruct base model, featuring 1.5 billion parameters and a 32768-token context length. Its primary distinction lies in its training methodology, which incorporates the GRPO (Grouped Reinforcement Learning with Policy Optimization) method. GRPO, introduced in the DeepSeekMath paper, is specifically designed to push the limits of mathematical reasoning in language models.

Key Capabilities

Enhanced Mathematical Reasoning: Leverages the GRPO training method to improve performance on mathematical tasks and logical deduction.
Instruction Following: Built upon an instruction-tuned base model, it is capable of understanding and executing user instructions.
Qwen2.5 Architecture: Benefits from the robust architecture of the Qwen2.5 series, known for its general language understanding.

Training Details

The model was fine-tuned using the Hugging Face TRL library (version 0.29.0). The application of the GRPO method, as detailed in the DeepSeekMath paper, suggests a focus on improving its ability to handle complex mathematical problems and reasoning chains. This training approach aims to provide a more robust and accurate response generation for quantitative tasks.

Good For

Applications requiring improved mathematical problem-solving.
Tasks that benefit from enhanced logical reasoning capabilities.
Developers looking for a compact, instruction-tuned model with a focus on numerical and logical accuracy.

Overview

Overview

Key Capabilities

Training Details

Good For

Full Model Card (README)