90dkn0ws/OpenR1-Distill-0.6B

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:0.8BQuant:BF16Ctx Length:32kTool Calling:SupportedPublished:May 28, 2025Architecture:Transformer Warm

90dkn0ws/OpenR1-Distill-0.6B is an 0.8 billion parameter causal language model, fine-tuned by 90dkn0ws from Qwen/Qwen3-0.6B-Base. It was trained using the TRL library on the open-r1/Mixture-of-Thoughts dataset. This model is optimized for general text generation tasks, leveraging its distillation from a larger base model for efficient inference.

Loading preview...

Model Overview

90dkn0ws/OpenR1-Distill-0.6B is an 0.8 billion parameter language model derived from the Qwen/Qwen3-0.6B-Base architecture. It has been specifically fine-tuned using the TRL library on the open-r1/Mixture-of-Thoughts dataset, aiming to distill capabilities for improved performance on conversational and reasoning-oriented prompts.

Key Capabilities

  • General Text Generation: Capable of generating coherent and contextually relevant text based on user prompts.
  • Instruction Following: Benefits from fine-tuning on a diverse dataset, enhancing its ability to follow instructions.
  • Efficient Inference: As a 0.8B parameter model, it offers a balance between performance and computational efficiency.

Training Details

The model underwent Supervised Fine-Tuning (SFT) using TRL version 0.18.0.dev0, Transformers 4.52.0.dev0, Pytorch 2.6.0, and Datasets 3.6.0. This process adapted the base Qwen3 model to the specific patterns and knowledge present in the Mixture-of-Thoughts dataset.

Good For

  • Quick Prototyping: Its smaller size makes it suitable for rapid development and testing of language-based applications.
  • Resource-Constrained Environments: Ideal for deployment where computational resources or memory are limited.
  • Exploratory NLP Tasks: Can be used for various natural language processing tasks requiring text generation or understanding.