SeaFill2025/Qwen3-4B-SFT

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:32kTool Calling:SupportedPublished:Mar 22, 2026License:apache-2.0Architecture:Transformer Open Weights Warm

SeaFill2025/Qwen3-4B-SFT-Math is a 4 billion parameter Qwen3-based model developed by the Sea-Fill Community, specifically fine-tuned for long-think mathematical reasoning. Derived from Qwen3-4B-Base, it excels in complex math problems, demonstrating significant improvements on benchmarks like AIME and AMC. This model is optimized for Chain-of-Thought (CoT) and instruction following in mathematical contexts, serving as a robust warm-start for reinforcement learning research.

Loading preview...

Qwen3-4B-SFT-Math: Specialized for Mathematical Reasoning

Qwen3-4B-SFT-Math is a 4 billion parameter model from the Sea-Fill Community, fine-tuned from Qwen3-4B-Base using a pure long-think math recipe at a ~45K scale. This model addresses the need for reproducible 'warm-start' SFT bases, bridging the gap between base models and reinforcement learning models, particularly for math-focused applications.

Key Capabilities & Features

  • Exceptional Math Reasoning: Demonstrates substantial performance gains in mathematical reasoning, with Pass@1 accuracy improvements of +20.62% on AIME 2025, +19.79% on AIME 2026, and +42.81% on AMC 2023 compared to its base model.
  • Optimized for CoT: Aligned for Chain-of-Thought (CoT) and instruction following, making it suitable for complex problem-solving requiring detailed step-by-step reasoning.
  • Warm-Start for RL: Designed as a robust SFT-only baseline for reinforcement learning (RL) research, allowing for further alignment studies.
  • Qwen Chat Template: Trained with the Qwen chat template, expecting responses to end with <|im_end|>. Users should configure eos_token_id to 151645.

Use Cases & Limitations

  • Good for: Pure mathematical reasoning tasks, especially those requiring long-think processes and detailed derivations. Ideal for researchers exploring SFT-to-RL alignment in math domains.
  • Limitations: This model is math-only SFT and is not optimized for general-domain reasoning, factuality, or instruction following outside of mathematics. It may produce hallucinations or unsafe outputs in non-math contexts. Users should also be aware that long rollouts are common, with a significant fraction hitting the 16K token cap on hard problems, suggesting a larger token budget (e.g., 32K) for AIME-level evaluations.