RLT-32B: A Reasoning-Focused Student Model

RLT-32B is a 32.8 billion parameter autoregressive language model developed by Sakana AI. This model is a "student" model, distilled from a 7B Reinforcement-Learned Teacher (RLT) model. The core innovation lies in the Reinforcement-Learned Teachers (RLT) pipeline, where the teacher model is explicitly trained to generate high-quality reasoning traces, which are then used to supervise the student model's training.

Key Capabilities & Training

Distilled Reasoning: The model's training emphasizes the distillation of reasoning capabilities from a specialized teacher, aiming for improved reasoning performance.
Supervised Fine-Tuning: It was fine-tuned using specific hyperparameters, a system prompt, and reasoning tags, following methodologies from Li et al. 2025.
Experimental Prototype: RLT-32B is provided as an experimental prototype for research and development, highlighting its innovative training approach.

Research Focus

This model is primarily intended for research into advanced language model training techniques, particularly those involving teacher-student distillation and reinforcement learning for enhanced reasoning. Further details on its development and evaluation can be found in the associated paper and code repository.

Overview

RLT-32B: A Reasoning-Focused Student Model

Key Capabilities & Training

Research Focus

Full Model Card (README)