tokyotech-llm/Qwen3-Swallow-32B-CPT-v0.2
TEXT GENERATIONConcurrency Cost:2Model Size:32BQuant:FP8Ctx Length:32kPublished:Jan 14, 2026License:apache-2.0Architecture:Transformer0.0K Open Weights Cold
Qwen3-Swallow-32B-CPT-v0.2 is a 32 billion parameter bilingual Japanese-English large language model developed by tokyotech-llm. Built upon the Qwen3 architecture, this model underwent Continual Pre-Training (CPT) with a 32K context size over 209.7 billion tokens, focusing on enhancing Japanese language proficiency and Japanese-English translation. A key differentiator is its maintained and even improved performance on math and coding tasks, achieved through the strategic use of high-quality STEM datasets with reasoning traces during CPT, making it suitable for applications requiring strong bilingual capabilities alongside technical reasoning.
Loading preview...