zhangsq-nju/Qwen3-0.6B-EdgeRazor-2.79bit
The zhangsq-nju/Qwen3-0.6B-EdgeRazor-2.79bit model is a 0.6 billion parameter language model developed by zhangsq-nju, based on Qwen3-0.6B. It features a mixed-precision quantization scheme, utilizing 2.79-bit for all decoder layers and 4-bit for embedding and lm_head, achieved through the EdgeRazor framework. This model is specifically optimized for efficient deployment on edge devices, offering a balance between performance and reduced memory footprint compared to its full-precision base model.
Loading preview...
Qwen3-0.6B-EdgeRazor-2.79bit Overview
This model is a highly optimized, quantized version of the Qwen3-0.6B base model, developed by zhangsq-nju using their EdgeRazor framework. It implements a mixed-precision quantization strategy, specifically 2.79-bit for all decoder layers and 4-bit for embedding and lm_head, aiming to significantly reduce model size and computational requirements while maintaining competitive performance.
Key Capabilities
- Efficient Inference: Designed for deployment on resource-constrained environments like edge devices due to its aggressive quantization.
- Mixed-Precision Quantization: Utilizes a 2.79-bit average bit-width across the model, balancing compression with accuracy.
- Instruction Following: Trained in instruct mode, making it suitable for chat and prompt-based interactions.
Good for
- Edge Device Deployment: Ideal for applications requiring a lightweight LLM on mobile, IoT, or other embedded systems.
- Resource-Constrained Environments: When memory and computational power are limited, this model offers a viable solution.
- Cost-Effective Inference: Reduces the operational cost associated with running larger, full-precision models.
Performance benchmarks show that while quantization introduces some degradation compared to the full-precision Qwen3-0.6B, the 2.79-bit EdgeRazor variant still achieves an average score of 44.17 across various tasks, demonstrating its practical utility for efficient AI applications. For more technical details, refer to the EdgeRazor arXiv paper.