Name: notbdq/Qwen2.5-14B-Instruct-1M-GRPO-Reasoning API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: notbdq

Overview

This model, notbdq/Qwen2.5-14B-Instruct-1M-GRPO-Reasoning, is a specialized fine-tune of the Qwen2.5-14B-Instruct-1M base model. Developed by notbdq, it leverages the GRPO (Guided Reasoning Policy Optimization) technique, trained on the Numina CoT (Chain-of-Thought) dataset. The primary goal of this fine-tuning is to enhance the model's ability to perform complex reasoning tasks, particularly those requiring a step-by-step thought process.

Key Capabilities

Explicit Reasoning: The model is designed to first generate a reasoning process within <think> tags before producing the final answer in <answer> tags, mimicking human problem-solving. This structured output is enforced through its instruction format.
Enhanced Problem Solving: Initial benchmarks on a subset of the AIME validation set suggest improved performance over the base Qwen 2.5 1M model in mathematical and logical challenges.
GRPO Technique: The application of GRPO aims to guide the model towards more robust and verifiable reasoning paths.

Benchmarking and Limitations

Preliminary Benchmarks: The developer has conducted preliminary tests on 15 samples of the AIME validation set, showing better performance than the base Qwen 2.5 1M model. A benchmarking script is provided for community evaluation.
Known Issues: The model may exhibit infinite generation when encountering particularly difficult problems and has shown growing sequence length during training.

When to Use This Model

This model is particularly well-suited for use cases requiring:

Mathematical and Logical Reasoning: Applications where detailed, step-by-step solutions are crucial.
Explainable AI: Scenarios where understanding the model's thought process is as important as the final answer.
Educational Tools: Generating explanations for complex problems.

It is important to note that while initial results are promising, comprehensive benchmarking is encouraged to fully assess its capabilities across a wider range of reasoning tasks.

Overview

Overview

Key Capabilities

Benchmarking and Limitations

When to Use This Model

Full Model Card (README)