keeeeenw/Llama-3.2-1B-Instruct-Open-R1-Distill

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:1BQuant:BF16Ctx Length:32kPublished:Feb 1, 2025License:apache-2.0Architecture:Transformer Open Weights Warm

The keeeeenw/Llama-3.2-1B-Instruct-Open-R1-Distill is a 1 billion parameter instruction-tuned causal language model developed by keeeeenw, built upon Llama-3.2-1B-Instruct and Hugging Face's OpenR1 framework. This model is specifically designed to bring powerful reasoning capabilities to compact, efficient architectures, making it suitable for on-device AI assistants and mobile applications. With a context length of 32768 tokens, it excels at systematic long-thinking processes for general-purpose and reasoning tasks, despite its small size.

Loading preview...

Llama-3.2-1B-Instruct-Open-R1-Distill: Reasoning in a Compact Form

This model, developed by keeeeenw, is a 1 billion parameter instruction-tuned language model based on Llama-3.2-1B-Instruct and Hugging Face's OpenR1 framework. It aims to deliver robust reasoning capabilities within a highly efficient architecture, enabling deployment on devices like laptop CPUs and smartphones.

Key Capabilities

  • Enhanced Reasoning: Leverages a distillation approach from DeepSeek-R1, fine-tuned with a teacher model's dataset to impart strong reasoning skills.
  • Compact & Efficient: Designed for lightweight operation, making it ideal for resource-constrained environments.
  • Systematic Thought Process: Instruction-tuned to explore questions through a detailed, multi-step thinking process before providing solutions, as demonstrated in its sample outputs.
  • General-Purpose & On-Device AI: Suitable for a range of tasks where reasoning and efficiency are critical.

Good for

  • On-device AI assistants requiring reasoning and general-purpose task execution.
  • Mobile and edge AI applications where model size and efficiency are paramount.
  • Chatbots and virtual assistants optimized for efficient processing.
  • Further fine-tuning for specific domains using Supervised Fine-Tuning (SFT) training.

While the model demonstrates coherent step-by-step reasoning, it's noted that it can sometimes be verbose or enter infinite loops, an area for potential improvement through prompt engineering and further training. Its evaluation on math_500 yielded an extractive_match score of 0.216, indicating room for improvement, particularly with more targeted mathematical distillation data.