SakanaAI/RLT-32B
SakanaAI/RLT-32B is a 32.8 billion parameter autoregressive language model developed by Sakana AI. This student model was trained using the Reinforcement-Learned Teachers (RLT) pipeline, distilled from a 7B Reinforcement-Learned Teacher optimized for producing high-quality reasoning traces. It is specifically fine-tuned with a focus on reasoning, making it suitable for research and development in advanced AI reasoning tasks.
Loading preview...
RLT-32B: A Reasoning-Focused Student Model
RLT-32B is a 32.8 billion parameter autoregressive language model developed by Sakana AI. This model is a "student" model, distilled from a 7B Reinforcement-Learned Teacher (RLT) model. The core innovation lies in the Reinforcement-Learned Teachers (RLT) pipeline, where the teacher model is explicitly trained to generate high-quality reasoning traces, which are then used to supervise the student model's training.
Key Capabilities & Training
- Distilled Reasoning: The model's training emphasizes the distillation of reasoning capabilities from a specialized teacher, aiming for improved reasoning performance.
- Supervised Fine-Tuning: It was fine-tuned using specific hyperparameters, a system prompt, and reasoning tags, following methodologies from Li et al. 2025.
- Experimental Prototype: RLT-32B is provided as an experimental prototype for research and development, highlighting its innovative training approach.
Research Focus
This model is primarily intended for research into advanced language model training techniques, particularly those involving teacher-student distillation and reinforcement learning for enhanced reasoning. Further details on its development and evaluation can be found in the associated paper and code repository.