RLT-32B: A Reasoning-Focused Student Model
RLT-32B is a 32.8 billion parameter autoregressive language model developed by Sakana AI. This model is a "student" model, distilled from a 7B Reinforcement-Learned Teacher (RLT) model. The core innovation lies in the Reinforcement-Learned Teachers (RLT) pipeline, where the teacher model is explicitly trained to generate high-quality reasoning traces, which are then used to supervise the student model's training.
Key Capabilities & Training
- Distilled Reasoning: The model's training emphasizes the distillation of reasoning capabilities from a specialized teacher, aiming for improved reasoning performance.
- Supervised Fine-Tuning: It was fine-tuned using specific hyperparameters, a system prompt, and reasoning tags, following methodologies from Li et al. 2025.
- Experimental Prototype: RLT-32B is provided as an experimental prototype for research and development, highlighting its innovative training approach.
Research Focus
This model is primarily intended for research into advanced language model training techniques, particularly those involving teacher-student distillation and reinforcement learning for enhanced reasoning. Further details on its development and evaluation can be found in the associated paper and code repository.