Name: tokyotech-llm/Qwen3-Swallow-32B-CPT-v0.2 API
Brand: Featherless.ai
Price: 25.00 USD
Availability: InStock
Author: tokyotech-llm

Qwen3-Swallow-32B-CPT-v0.2 Overview

Qwen3-Swallow-32B-CPT-v0.2 is a 32 billion parameter model from the Qwen3-Swallow family, developed by tokyotech-llm. This model is a bilingual Japanese-English large language model, built on the Qwen3 architecture and specifically optimized through Continual Pre-Training (CPT), Supervised Fine-Tuning (SFT), and Reinforcement Learning with Verifiable Rewards (RLVR).

Key Capabilities

Exceptional Bilingual Proficiency: Highly optimized for both Japanese and English, making it ideal for cross-lingual applications and translation.
Strong STEM Performance: Through strategic CPT and SFT using high-quality math and code datasets with reasoning traces, the model successfully retained and even enhanced its performance in mathematics and coding tasks, preventing catastrophic forgetting.
Enhanced Reasoning: Achieves reasoning capabilities on par with, and in some cases surpassing, the original Qwen3 models, further improved through RLVR.
Extensive Training: Underwent CPT with a 32K (32,768) context size over a total of 209.7 billion tokens, utilizing diverse datasets including Japanese and English corpora, parallel corpora, and specialized math and code datasets.

Good For

Applications requiring robust Japanese and English language understanding and generation.
Tasks involving Japanese-English translation.
Use cases demanding strong mathematical and coding reasoning abilities.
Scenarios where retaining STEM performance is crucial in a bilingual model.

For comprehensive performance details, refer to the Swallow LLM Leaderboard. The model is primarily debugged and evaluated using vLLM, which is recommended for reliable inference.