tokyotech-llm/Qwen3-Swallow-8B-SFT-v0.2

Warm
Public
8B
FP8
32768
1
Jan 1, 2026
License: apache-2.0
Hugging Face

Qwen3-Swallow-8B-SFT-v0.2 is an 8 billion parameter instruction-tuned large language model developed by tokyotech-llm, based on the Qwen3 architecture. This model is specifically optimized for bilingual Japanese-English proficiency, while also maintaining and enhancing performance in mathematical and coding tasks through Continual Pre-Training (CPT), Supervised Fine-Tuning (SFT), and Reinforcement Learning with Verifiable Rewards (RLVR). Its primary use case is for applications requiring strong performance in both Japanese and English, particularly those involving STEM-related reasoning and code generation.

No reviews yet. Be the first to review!