tokyotech-llm/Qwen3-Swallow-8B-SFT-v0.2

Warm
Public
8B
FP8
32768
1
Jan 1, 2026
License: apache-2.0
Hugging Face

Qwen3-Swallow-8B-SFT-v0.2 is an 8 billion parameter instruction-tuned large language model developed by tokyotech-llm, based on the Qwen3 architecture. This model is specifically optimized for bilingual Japanese-English proficiency, while also maintaining and enhancing performance in mathematical and coding tasks through Continual Pre-Training (CPT), Supervised Fine-Tuning (SFT), and Reinforcement Learning with Verifiable Rewards (RLVR). Its primary use case is for applications requiring strong performance in both Japanese and English, particularly those involving STEM-related reasoning and code generation.

Overview

Qwen3-Swallow-8B-SFT-v0.2 Overview

Qwen3-Swallow-8B-SFT-v0.2 is an 8 billion parameter model from the Qwen3-Swallow family, developed by tokyotech-llm. This model is a Supervised Fine-Tuned (SFT) variant, building upon a Qwen3 base model through a rigorous training pipeline that includes Continual Pre-Training (CPT), SFT, and Reinforcement Learning with Verifiable Rewards (RLVR). The development focused on creating a highly proficient bilingual Japanese-English model that also excels in complex reasoning tasks, particularly in mathematics and coding.

Key Capabilities

  • Bilingual Proficiency: Optimized for high performance in both Japanese and English language understanding and generation.
  • Retained STEM Performance: Successfully prevents catastrophic forgetting in mathematics and coding during continual pre-training and fine-tuning, utilizing high-quality math and code datasets with reasoning traces.
  • Enhanced Reasoning: Demonstrates reasoning capabilities on par with, and in some tasks, surpassing the original Qwen3 models, further improved through RLVR.
  • Instruction Following: Fine-tuned with 2.1 million samples for the 8B model, including Japanese and English chat datasets, to improve instruction adherence.

Good for

  • Applications requiring robust Japanese and English language processing.
  • Tasks involving mathematical problem-solving and code generation.
  • Use cases where strong reasoning capabilities are critical.
  • Developers seeking a model debugged and evaluated primarily with vLLM for reliable inference.