bekhzod-olimov/Qwen3-0.6B-Instruct-Uz

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:0.8BQuant:BF16Ctx Length:32kPublished:Sep 3, 2025License:apache-2.0Architecture:Transformer0.0K Open Weights Warm

The Qwen3-0.6B-Instruct-Uz v2.0 model by Bekhzod Olimov is a 0.6 billion parameter, fully fine-tuned Uzbek language model based on the Qwen2.5-0.5B-Instruct architecture. Optimized for efficiency, it boasts the lowest GPU VRAM usage (1.12 GB) and fastest inference speed (5.10s) among comparable Uzbek models. This model is designed for cost-effective production deployment, excelling in scenarios requiring resource efficiency and strong Uzbek language understanding.

Loading preview...

Overview

Qwen3-0.6B-Instruct-Uz v2.0, developed by Bekhzod Olimov, is a fully fine-tuned Uzbek language model with 0.6 billion parameters, built upon the Qwen2.5-0.5B-Instruct base. This version represents a complete reimagining from its beta, focusing on production-grade performance and efficiency.

Key Capabilities & Differentiators

  • Resource Efficiency: Achieves the lowest GPU VRAM usage (1.12 GB) and fastest inference speed (5.10s) compared to other Uzbek models, making it highly cost-effective for deployment.
  • Full Fine-tuning: Unlike LoRA or vocabulary expansion, all 596 million parameters were fine-tuned on 162,508 high-quality Uzbek instruction examples, ensuring better quality and stability.
  • Zero Repetition: Optimized generation parameters eliminate repetition issues found in previous versions.
  • Uzbek Language Focus: Specifically designed for strong Uzbek language understanding, outperforming larger models like Llama-3.2-1B in this regard.
  • Production-Ready: Verified for deployment, offering significant cost savings (40-94% cheaper) and high throughput (28.84 tok/s).

Ideal Use Cases

  • Customer Service Chatbots: Provides real-time, cost-effective responses with Uzbek cultural understanding.
  • Mobile & Edge Devices: Its low VRAM footprint allows for on-device inference on consumer GPUs (e.g., GTX 1650+).
  • Educational Applications: Suitable for schools and interactive learning tools with limited hardware.
  • Cost-Sensitive Deployments: Excellent for startups, NGOs, and research projects due to its efficiency.

Limitations

While highly efficient, it is not recommended for professional translation services, complex reasoning tasks, or high-stakes decisions where maximum quality at any cost is paramount. Its knowledge breadth is also limited compared to much larger models.