maheshrawat18/Qwen3-4B-GRPO-sft

TEXT GENERATIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:32kPublished:Apr 24, 2026License:apache-2.0Architecture:Transformer Open Weights Cold

The maheshrawat18/Qwen3-4B-GRPO-sft is a 4 billion parameter Qwen3-based causal language model developed by maheshrawat18. This model was fine-tuned from maheshrawat18/Qwen3-4B-Thinking-2507-merged and optimized for faster training using Unsloth and Huggingface's TRL library. It features a 32768 token context length, making it suitable for tasks requiring processing of longer inputs.

Loading preview...

Model Overview

The maheshrawat18/Qwen3-4B-GRPO-sft is a 4 billion parameter language model based on the Qwen3 architecture, developed by maheshrawat18. It is a fine-tuned version of the maheshrawat18/Qwen3-4B-Thinking-2507-merged model.

Key Characteristics

  • Efficient Training: This model was trained significantly faster (2x) by leveraging Unsloth and Huggingface's TRL library, indicating an optimization for training efficiency.
  • Context Length: It supports a substantial context length of 32768 tokens, allowing it to handle extensive inputs and generate coherent, long-form responses.

Potential Use Cases

Given its efficient training methodology and considerable context window, this model is well-suited for applications where:

  • Processing and understanding long documents or conversations is crucial.
  • Rapid iteration and fine-tuning on custom datasets are desired due to its optimized training process.
  • General language understanding and generation tasks are required within a 4 billion parameter footprint.