SakanaAI/RLT-7B

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:7.6BQuant:FP8Ctx Length:32kPublished:Jun 21, 2025License:apache-2.0Architecture:Transformer0.0K Open Weights Warm

Sakana AI's RLT-7B is a 7.6 billion parameter autoregressive language model with a 131,072 token context length, developed using the Reinforcement-Learned Teachers (RLT) pipeline. This student model is distilled from a 7B Reinforcement-Learned Teacher explicitly trained to produce high-quality reasoning traces for student distillation. It is optimized for reasoning tasks, leveraging supervised fine-tuning with specific hyperparameters and reasoning tags. The model is provided for research and development purposes as an experimental prototype.

Loading preview...

SakanaAI/RLT-7B: Reinforcement-Learned Teacher Student Model

RLT-7B is a 7.6 billion parameter autoregressive language model developed by Sakana AI. This model is a "student" model, distilled from a 7B Reinforcement-Learned Teacher. The core innovation lies in the Reinforcement-Learned Teachers (RLT) pipeline, where the teacher model is explicitly trained to generate high-quality reasoning traces, which are then used to distill knowledge into the student model.

Key Characteristics

  • RLT Pipeline: Utilizes a novel Reinforcement-Learned Teachers approach for distillation, focusing on reasoning trace quality.
  • Reasoning Optimization: Trained with supervised fine-tuning using specific hyperparameters and reasoning tags from Li et al. 2025 to enhance reasoning capabilities.
  • Context Length: Features a notable context length of 131,072 tokens.
  • Research Prototype: Intended for research and development, not commercial deployment, as an experimental prototype.

Evaluation and Resources

Evaluation of RLT-7B was conducted using the SkyThought library. Further details on the RLT pipeline, training methodology, and results can be found in the associated paper and the Sakana AI RLT repository.

Popular Sampler Settings

Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.

temperature
top_p
top_k
frequency_penalty
presence_penalty
repetition_penalty
min_p