Qwen3-Swallow-8B-SFT-v0.2 is an 8 billion parameter instruction-tuned large language model developed by tokyotech-llm, based on the Qwen3 architecture. This model is specifically optimized for bilingual Japanese-English proficiency, while also maintaining and enhancing performance in mathematical and coding tasks through Continual Pre-Training (CPT), Supervised Fine-Tuning (SFT), and Reinforcement Learning with Verifiable Rewards (RLVR). Its primary use case is for applications requiring strong performance in both Japanese and English, particularly those involving STEM-related reasoning and code generation.
Qwen3-Swallow-8B-SFT-v0.2 Overview
Qwen3-Swallow-8B-SFT-v0.2 is an 8 billion parameter model from the Qwen3-Swallow family, developed by tokyotech-llm. This model is a Supervised Fine-Tuned (SFT) variant, building upon a Qwen3 base model through a rigorous training pipeline that includes Continual Pre-Training (CPT), SFT, and Reinforcement Learning with Verifiable Rewards (RLVR). The development focused on creating a highly proficient bilingual Japanese-English model that also excels in complex reasoning tasks, particularly in mathematics and coding.
Key Capabilities
- Bilingual Proficiency: Optimized for high performance in both Japanese and English language understanding and generation.
- Retained STEM Performance: Successfully prevents catastrophic forgetting in mathematics and coding during continual pre-training and fine-tuning, utilizing high-quality math and code datasets with reasoning traces.
- Enhanced Reasoning: Demonstrates reasoning capabilities on par with, and in some tasks, surpassing the original Qwen3 models, further improved through RLVR.
- Instruction Following: Fine-tuned with 2.1 million samples for the 8B model, including Japanese and English chat datasets, to improve instruction adherence.
Good for
- Applications requiring robust Japanese and English language processing.
- Tasks involving mathematical problem-solving and code generation.
- Use cases where strong reasoning capabilities are critical.
- Developers seeking a model debugged and evaluated primarily with vLLM for reliable inference.