Name: Kazuki1450/Qwen2.5-1.5B-Instruct_csum_6_10_tok_first_1p0_0p0_1p0_grpo_42_rule API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: Kazuki1450

Overview

This model, developed by Kazuki1450, is a fine-tuned variant of the Qwen2.5-1.5B-Instruct base model, featuring 1.5 billion parameters and a substantial context length of 131072 tokens. It has been specifically trained using the GRPO (Gradient-based Reward Policy Optimization) method, as introduced in the research paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models". This training approach suggests a focus on improving the model's ability to handle complex reasoning tasks, particularly in mathematical domains.

Key Capabilities

Enhanced Reasoning: The application of the GRPO method implies a specialization in improving logical and mathematical reasoning abilities.
Instruction Following: As an instruction-tuned model, it is designed to accurately follow user prompts and generate relevant responses.
Large Context Window: With a 131072-token context length, it can process and generate text based on extensive input, beneficial for complex problem-solving or long-form content generation.

Training Details

The model was fine-tuned using the TRL library (Transformer Reinforcement Learning) and incorporates the GRPO method. This method is detailed in the DeepSeekMath paper, which focuses on advancing mathematical reasoning in large language models.

When to Use This Model

This model is particularly suitable for applications requiring strong reasoning capabilities, especially those involving mathematical or logical problem-solving. Its instruction-following nature makes it versatile for various NLP tasks where precise and context-aware responses are needed, leveraging its large context window for handling detailed queries.

Overview

Overview

Key Capabilities

Training Details

When to Use This Model

Full Model Card (README)