whooray/Qwen2.5-1.5B-Open-R1-Distill-ko

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:Feb 8, 2025Architecture:Transformer Warm

whooray/Qwen2.5-1.5B-Open-R1-Distill-ko is a 1.5 billion parameter language model fine-tuned from Qwen/Qwen2.5-1.5B-Instruct. Developed by whooray, this model is specifically optimized for Korean reasoning tasks, leveraging the lemon-mint/korean-reasoning-v02 dataset. It supports a 32768-token context length and is designed for applications requiring robust Korean language understanding and logical inference.

Loading preview...

Model Overview

This model, whooray/Qwen2.5-1.5B-Open-R1-Distill-ko, is a specialized version of the Qwen2.5-1.5B-Instruct base model. It has been fine-tuned using the TRL library on the lemon-mint/korean-reasoning-v02 dataset, indicating a strong focus on enhancing its capabilities for Korean language reasoning tasks. The model supports a substantial context length of 32768 tokens, allowing for processing longer inputs and maintaining coherence over extended dialogues or documents.

Key Capabilities

  • Korean Reasoning: Specifically trained on a Korean reasoning dataset to improve logical inference and problem-solving in Korean.
  • Multilingual Support: Inherits multilingual capabilities from the base Qwen2.5 model, including Chinese, English, French, Spanish, Portuguese, German, Italian, Russian, Japanese, Korean, Vietnamese, Thai, and Arabic.
  • Instruction Following: Benefits from the instruction-tuned nature of its base model, enabling it to follow user prompts effectively.

Good For

  • Korean Language Applications: Ideal for tasks requiring nuanced understanding and generation of Korean text, particularly those involving reasoning.
  • Research and Development: Suitable for researchers exploring fine-tuning techniques for specific language tasks or distilling larger models for efficiency.
  • Resource-Constrained Environments: As a 1.5 billion parameter model, it offers a balance between performance and computational efficiency compared to larger models.