RoyceLu/OpenR1-Distill-0.6B
TEXT GENERATIONConcurrency Cost:1Model Size:0.8BQuant:BF16Ctx Length:32kPublished:Mar 5, 2026License:apache-2.0Architecture:Transformer Open Weights Cold

RoyceLu/OpenR1-Distill-0.6B is a 0.8 billion parameter Qwen3-0.6B-Base model fine-tuned using the Open-R1 supervised distillation recipe on the Mixture-of-Thoughts dataset. This model is specifically optimized for reasoning-oriented text generation, demonstrating improved performance on benchmarks like AIME 2024 and GPQA Diamond compared to its base model. It features a 32768 token context length and is intended for applications requiring robust reasoning capabilities.

Loading preview...

OpenR1-Distill-0.6B: Reasoning-Oriented Distilled Model

OpenR1-Distill-0.6B is a 0.8 billion parameter language model developed by RoyceLu, based on the Qwen/Qwen3-0.6B-Base architecture. It has been fine-tuned using the Open-R1 supervised distillation recipe, leveraging the open-r1/Mixture-of-Thoughts dataset, which involved training on 349,317 samples over 5 epochs.

Key Capabilities & Features

  • Reasoning Enhancement: Specifically fine-tuned for reasoning-oriented text generation, aiming to improve logical processing and problem-solving.
  • Extended Context Window: Supports a maximum sequence length of 32768 tokens, allowing for processing longer inputs and generating more extensive outputs.
  • Distillation Approach: Utilizes a supervised distillation recipe, transferring knowledge from a larger model (implied by Open-R1 recipe) to a smaller, more efficient 0.8B parameter model.
  • Benchmark Improvements: Shows performance gains over its base model on benchmarks such as AIME 2024 (up to +0.73 pp) and GPQA Diamond (up to +1.89 pp) in specific max_new_tokens settings, indicating enhanced reasoning and general knowledge capabilities.

Good For

  • Reasoning Tasks: Ideal for applications requiring logical deduction, problem-solving, and complex question answering.
  • Resource-Constrained Environments: As a 0.8B parameter model, it offers a more efficient solution for reasoning tasks compared to larger models, while maintaining a substantial context window.
  • Text Generation: Suited for generating coherent and contextually relevant text, particularly in scenarios where reasoning is a primary requirement.