maheshrawat18/Qwen3-4B-2507-sft-merged-thinking-final
TEXT GENERATIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:32kPublished:Apr 16, 2026License:apache-2.0Architecture:Transformer Open Weights Cold
The maheshrawat18/Qwen3-4B-2507-sft-merged-thinking-final is a 4 billion parameter Qwen3-based causal language model developed by maheshrawat18. Fine-tuned from unsloth/Qwen3-4B-Thinking-2507, this model was trained using Unsloth and Huggingface's TRL library, enabling 2x faster training. It supports a context length of 32768 tokens and is suitable for general language generation tasks where efficient training and a substantial context window are beneficial.
Loading preview...
Overview
maheshrawat18/Qwen3-4B-2507-sft-merged-thinking-final is a 4 billion parameter language model based on the Qwen3 architecture, developed by maheshrawat18. This model was fine-tuned from the unsloth/Qwen3-4B-Thinking-2507 base model.
Key Capabilities
- Efficient Training: Leverages Unsloth and Huggingface's TRL library, resulting in a 2x speedup during the training process.
- Substantial Context Window: Supports a context length of 32768 tokens, allowing for processing and generating longer sequences of text.
- Qwen3 Architecture: Benefits from the underlying Qwen3 architecture, providing a strong foundation for various natural language processing tasks.
Good for
- Applications requiring a 4 billion parameter model with an extended context length.
- Scenarios where efficient fine-tuning is a priority.
- General text generation and understanding tasks building upon the Qwen3 family.