Name: 96kevinli29/Qwen3-4B-SFT API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: 96kevinli29

Qwen3-4B-SFT: A Reasoning-Focused Pre-RL Base Model

Qwen3-4B-SFT is a 4 billion parameter model developed by 96kevinli29, fine-tuned from Qwen3-4B-Base using the verl framework. It addresses the need for a reproducible "pre-RL SFT base" checkpoint, offering a Math-forward, Reasoning-focused, and Format-aligned foundation. This model is particularly well-suited as a clean, warm-start base for Reinforcement Learning (RL) research or for direct application in complex reasoning tasks.

Key Capabilities

Enhanced Reasoning: Optimized for Chain of Thought (COT) reasoning, showing significant improvements over its base model.
Mathematical Proficiency: Demonstrates strong performance on mathematical benchmarks such as AIME and AMC.
Reproducible SFT Base: Provides a practical and open-source intermediate checkpoint for RL alignment research.
Qwen Chat Template Alignment: Trained to use the Qwen chat template, ending responses with <|im_end|>, ensuring consistent formatting.

Benchmark Highlights

Qwen3-4B-SFT significantly outperforms Qwen3-4B-Base and SFT (RLCER) on several challenging reasoning and math datasets:

AIME 2024: Achieves 20.8% (vs. 11.25% for Base, 17.29% for SFT RLCER).
GPQA-Diamond: Scores 29.1% (vs. 7.77% for Base, 24.43% for SFT RLCER).

Good for

Serving as a warm-start base for Reinforcement Learning (RL) experiments and alignment research.
Applications requiring strong mathematical and complex reasoning capabilities.
Developers seeking a reproducible and well-aligned SFT checkpoint for further fine-tuning.

Limitations

Not universally optimized for factual correctness.
May still exhibit hallucinations or produce unsafe content.
Performance can be sensitive to specific prompt styles and decoding configurations.