tocchitocchi/Qwen3-Swallow-32B-RL-v0.2-MLX-fp16

TEXT GENERATIONConcurrency Cost:2Model Size:32BQuant:FP8Ctx Length:32kPublished:Mar 6, 2026License:apache-2.0Architecture:Transformer Open Weights Cold

The tocchitocchi/Qwen3-Swallow-32B-RL-v0.2-MLX-fp16 is a 32 billion parameter dense transformer model, converted to MLX format for Apple Silicon optimization. Developed by the Swallow Project (Institute of Science Tokyo and AIST) based on Qwen3, it is a bilingual Japanese-English model. This model excels in both Japanese and English tasks, maintaining strong capabilities in mathematics and coding through Continual Pre-Training, Supervised Fine-Tuning, and Reinforcement Learning.

Loading preview...

Overview

This model, tocchitocchi/Qwen3-Swallow-32B-RL-v0.2-MLX-fp16, is an MLX-format conversion of the tokyotech-llm/Qwen3-Swallow-32B-RL-v0.2 model, specifically optimized for Apple Silicon. It is a 32 billion parameter dense transformer, provided in full precision (fp16) and has a size of approximately 61 GB.

Key Capabilities

  • Bilingual Proficiency: Developed by the Swallow Project (Institute of Science Tokyo and AIST), it is a large language model proficient in both Japanese and English.
  • Robust Training: Built upon Qwen3, the model underwent Continual Pre-Training (CPT), Supervised Fine-Tuning (SFT), and Reinforcement Learning (RL) to enhance its performance.
  • Multifaceted Strengths: Achieves strong performance across language tasks while retaining capabilities in mathematics and coding.
  • Apple Silicon Optimization: Converted to the MLX format, making it efficient for use on Apple Silicon hardware.

Usage

This model supports quick integration via Python using mlx_lm for generation, interactive chat through the command line, and can be run as an OpenAI-compatible server for broader client compatibility.