Name: tokyotech-llm/Qwen3-Swallow-32B-SFT-v0.2 API
Brand: Featherless.ai
Price: 25.00 USD
Availability: InStock
Author: tokyotech-llm

Qwen3-Swallow-32B-SFT-v0.2 Overview

Qwen3-Swallow-32B-SFT-v0.2 is a 32 billion parameter model from the Qwen3-Swallow family, developed by tokyotech-llm. This model is specifically the Supervised Fine-Tuned (SFT) version, part of a development pipeline that also includes Continual Pre-Training (CPT) and Reinforcement Learning with Verifiable Rewards (RLVR).

Key Capabilities & Features

Bilingual Proficiency: Highly optimized for both Japanese and English, enhancing cross-lingual understanding and translation.
Retained STEM Performance: Through strategic CPT and SFT using high-quality math and code datasets, the model successfully prevents catastrophic forgetting in mathematics and coding.
Enhanced Reasoning: Achieves reasoning performance on par with, and in some tasks surpassing, the original Qwen3 models.
Robust Training: Developed using a comprehensive training regimen including CPT on 209.7 billion tokens and SFT with 1.1 million samples, with a context size of 32K (32,768).

Good For

Japanese and English Applications: Ideal for use cases requiring strong performance in both Japanese and English, including translation and bilingual content generation.
Technical and STEM Tasks: Suitable for applications involving mathematical problem-solving and code generation, where reasoning capabilities are crucial.
Instruction Following: As an SFT model, it is designed to follow instructions effectively, making it suitable for various conversational and task-oriented applications.