zhangsq-nju/Qwen3-0.6B-EdgeRazor-1.88bit
The zhangsq-nju/Qwen3-0.6B-EdgeRazor-1.88bit model is a 0.6 billion parameter language model based on the Qwen3 architecture, developed by zhangsq-nju. It features a highly optimized 1.88-bit mixed-precision quantization for all decoder layers and 4-bit for embedding and lm_head, achieved through the EdgeRazor framework. This model is specifically designed for efficient deployment on edge devices, offering a balance between performance and extreme resource constraint.
Loading preview...
Model Overview: Qwen3-0.6B-EdgeRazor-1.88bit
This model is a highly quantized version of the Qwen3-0.6B base model, developed by zhangsq-nju using their EdgeRazor framework. Its primary differentiator is the aggressive mixed-precision quantization, specifically utilizing a 1.88-bit bit-width for all decoder layers and 4-bit for embedding and lm_head. This makes it exceptionally lightweight and suitable for environments with severe memory and computational constraints.
Key Capabilities & Features
- Extreme Quantization: Achieves a 1.88-bit average bit-width for core components, significantly reducing model size and inference cost.
- Edge Deployment: Optimized for deployment on resource-limited edge devices where traditional LLMs are impractical.
- Performance Trade-off: While highly compressed, it maintains a competitive average performance of 41.76 across various benchmarks (ARC-e, HellaS., MMLU, GSM8K, etc.) compared to its less quantized counterparts, demonstrating effective quantization-aware distillation.
- Easy Integration: Provides a straightforward
transformerslibrary quickstart for inference, including support for activation and KV cache quantization viatrust_remote_code=True.
Should You Use This Model?
This model is ideal for use cases where:
- Resource Constraints are Paramount: You need an LLM to run on devices with very limited memory or processing power.
- Efficiency is Critical: Minimizing inference latency and energy consumption is a top priority.
- Small Model Footprint: A compact model size is essential for deployment or distribution.
It's important to note that while performance is optimized for its size, there is an inherent trade-off compared to full-precision or less quantized models. Evaluate its benchmark scores against your specific task requirements.