Name: clijo/qwen3-4b-instruct-2507-bf16-reco-grpo-b200-rapid-red-summit API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: clijo

Model Overview

This model, clijo/qwen3-4b-instruct-2507-bf16-reco-grpo-b200-rapid-red-summit, is a fine-tuned variant of the Qwen3-4B-Instruct-2507 base model. It has been specifically trained using the GRPO (Gradient-based Reward Policy Optimization) method, a technique highlighted in the research paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models". This training approach aims to improve the model's ability to handle complex reasoning tasks.

Key Capabilities

Enhanced Reasoning: Benefits from GRPO training, suggesting improved performance in tasks requiring logical deduction and problem-solving.
Instruction Following: As an instruction-tuned model, it is designed to accurately interpret and execute user prompts.
Mathematical Aptitude: The underlying GRPO method's origin in mathematical reasoning research implies a potential strength in mathematical and logical tasks.

Training Details

The model was fine-tuned using the TRL (Transformers Reinforcement Learning) library. The training process utilized specific versions of frameworks including TRL 1.5.1, Transformers 5.9.0, Pytorch 2.11.0+cu130, Datasets 4.8.5, and Tokenizers 0.22.2. Further details on the training run are available via Weights & Biases.

Good For

Applications requiring robust instruction following.
Tasks that benefit from enhanced reasoning capabilities, especially those with a mathematical or logical component.
Developers looking for a 4B parameter model with specialized training for complex problem-solving.

Overview

Model Overview

Key Capabilities

Training Details

Good For

Full Model Card (README)