Name: maheshrawat18/Qwen3-4B-GRPO-sft API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: maheshrawat18

Model Overview

The maheshrawat18/Qwen3-4B-GRPO-sft is a 4 billion parameter language model based on the Qwen3 architecture, developed by maheshrawat18. It is a fine-tuned version of the maheshrawat18/Qwen3-4B-Thinking-2507-merged model.

Key Characteristics

Efficient Training: This model was trained significantly faster (2x) by leveraging Unsloth and Huggingface's TRL library, indicating an optimization for training efficiency.
Context Length: It supports a substantial context length of 32768 tokens, allowing it to handle extensive inputs and generate coherent, long-form responses.

Potential Use Cases

Given its efficient training methodology and considerable context window, this model is well-suited for applications where:

Processing and understanding long documents or conversations is crucial.
Rapid iteration and fine-tuning on custom datasets are desired due to its optimized training process.
General language understanding and generation tasks are required within a 4 billion parameter footprint.

Overview

Model Overview

Key Characteristics

Potential Use Cases

Full Model Card (README)