Name: clijo/qwen3-4b-instruct-2507-bf16-reco-grpo-b200-rapid-orange-quartz API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: clijo

Model Overview

This model, clijo/qwen3-4b-instruct-2507-bf16-reco-grpo-b200-rapid-orange-quartz, is an instruction-tuned variant of the Qwen3-4B-Instruct-2507 base model. It features 4 billion parameters and supports a substantial context length of 32768 tokens, making it suitable for processing longer inputs and complex queries.

Key Capabilities

Enhanced Mathematical Reasoning: The model was specifically fine-tuned using the GRPO (Gradient-based Reward Policy Optimization) method, as introduced in the DeepSeekMath research. This training approach aims to significantly improve its performance on mathematical reasoning tasks.
Instruction Following: As an instruction-tuned model, it is designed to accurately understand and execute user instructions, providing relevant and coherent responses.
Foundation Model: Built upon the Qwen3-4B-Instruct-2507 architecture, it inherits a strong foundation for general language understanding and generation.

Training Details

The fine-tuning process utilized the TRL (Transformers Reinforcement Learning) framework. The application of GRPO, a technique highlighted in the paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models" arXiv:2402.03300, indicates a focus on specialized reasoning abilities. This makes it a strong candidate for applications where precise logical and mathematical outputs are critical.

Overview

Model Overview

Key Capabilities

Training Details

Full Model Card (README)