Name: Kazuki1450/Qwen2.5-1.5B-Instruct_csum_6_10_tok_After_1p0_0p0_1p0_grpo_42_rule API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: Kazuki1450

Model Overview

This model, Kazuki1450/Qwen2.5-1.5B-Instruct_csum_6_10_tok_After_1p0_0p0_1p0_grpo_42_rule, is a fine-tuned version of the Qwen/Qwen2.5-1.5B-Instruct base model, featuring 1.5 billion parameters and a substantial 131072 token context length. It was developed by Kazuki1450 and trained using the TRL framework.

Key Differentiator: GRPO Training

The primary distinction of this model lies in its training methodology. It was fine-tuned using the GRPO (Gradient-based Reasoning Policy Optimization) method, as introduced in the research paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models" (Shao et al., 2024). This technique is specifically designed to improve the model's ability to handle complex mathematical and logical reasoning tasks.

Use Cases

Given its GRPO-enhanced training, this model is particularly well-suited for:

Mathematical problem-solving: Excelling in tasks that require step-by-step logical deduction and numerical accuracy.
Reasoning-intensive applications: Ideal for scenarios where robust logical inference is paramount.
Instruction-following tasks: Benefiting from its instruction-tuned base, it can accurately follow complex directives, especially those with a reasoning component.

Developers can quickly integrate this model using the Hugging Face transformers library for text generation tasks.

Overview

Model Overview

Key Differentiator: GRPO Training

Use Cases

Full Model Card (README)