Name: clijo/qwen3-4b-instruct-2507-bf16-reco-grpo-b200-golden-violet-vector API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: clijo

Model Overview

This model, clijo/qwen3-4b-instruct-2507-bf16-reco-grpo-b200-golden-violet-vector, is a 4-billion parameter instruction-tuned variant of the Qwen3-4B-Instruct-2507 base model. It has been fine-tuned using the TRL (Transformers Reinforcement Learning) library.

Key Differentiator: GRPO Training

A significant aspect of this model's training is the application of GRPO (Gradient Regularized Policy Optimization). This method, introduced in the research paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models" (arXiv:2402.03300), suggests an optimization for enhancing mathematical reasoning abilities in language models. This indicates the model is specifically geared towards improving performance on complex reasoning tasks.

Capabilities & Use Cases

Given its GRPO-enhanced training, this model is particularly well-suited for:

Mathematical reasoning: Solving problems that require logical deduction and numerical understanding.
Instruction following: Executing complex instructions accurately due to its instruction-tuned nature.
General language generation: Handling a wide range of text generation tasks, benefiting from the Qwen3-4B-Instruct base.

With a context length of 32768 tokens, it can process and generate longer, more intricate responses, making it suitable for applications requiring extensive context understanding.

Overview

Model Overview

Key Differentiator: GRPO Training

Capabilities & Use Cases

Full Model Card (README)