Name: clijo/qwen3-4b-instruct-2507-bf16-reco-grpo-b200-rapid-lime-orbit API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: clijo

Model Overview

This model, clijo/qwen3-4b-instruct-2507-bf16-reco-grpo-b200-rapid-lime-orbit, is an instruction-tuned variant of the Qwen3-4B-Instruct-2507 base model, featuring 4 billion parameters and a substantial 32768 token context window. It was developed using the TRL (Transformers Reinforcement Learning) framework.

Key Differentiator: GRPO Training

A significant aspect of this model's training is the application of GRPO (Gradient-based Reward Optimization). This method, introduced in the research paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models" (arXiv:2402.03300), is specifically designed to improve a model's proficiency in mathematical reasoning tasks. This suggests an enhanced capability for handling complex numerical and logical problems compared to models not trained with this technique.

Intended Use Cases

Given its fine-tuning with GRPO, this model is particularly well-suited for applications that demand robust mathematical and logical reasoning. Developers looking for a compact yet capable model for tasks such as:

Solving mathematical word problems
Generating logical explanations
Assisting with data analysis interpretations

This model provides a specialized option within the 4B parameter class, leveraging advanced training techniques for specific performance gains.

Overview

Model Overview

Key Differentiator: GRPO Training

Intended Use Cases

Full Model Card (README)