SakanaAI/RLT-7B
Sakana AI's RLT-7B is a 7.6 billion parameter autoregressive language model with a 131,072 token context length, developed using the Reinforcement-Learned Teachers (RLT) pipeline. This student model is distilled from a 7B Reinforcement-Learned Teacher explicitly trained to produce high-quality reasoning traces for student distillation. It is optimized for reasoning tasks, leveraging supervised fine-tuning with specific hyperparameters and reasoning tags. The model is provided for research and development purposes as an experimental prototype.
Loading preview...
SakanaAI/RLT-7B: Reinforcement-Learned Teacher Student Model
RLT-7B is a 7.6 billion parameter autoregressive language model developed by Sakana AI. This model is a "student" model, distilled from a 7B Reinforcement-Learned Teacher. The core innovation lies in the Reinforcement-Learned Teachers (RLT) pipeline, where the teacher model is explicitly trained to generate high-quality reasoning traces, which are then used to distill knowledge into the student model.
Key Characteristics
- RLT Pipeline: Utilizes a novel Reinforcement-Learned Teachers approach for distillation, focusing on reasoning trace quality.
- Reasoning Optimization: Trained with supervised fine-tuning using specific hyperparameters and reasoning tags from Li et al. 2025 to enhance reasoning capabilities.
- Context Length: Features a notable context length of 131,072 tokens.
- Research Prototype: Intended for research and development, not commercial deployment, as an experimental prototype.
Evaluation and Resources
Evaluation of RLT-7B was conducted using the SkyThought library. Further details on the RLT pipeline, training methodology, and results can be found in the associated paper and the Sakana AI RLT repository.
Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.