zhangsq-nju/Qwen3-1.7B-EdgeRazor-1.58bit
The zhangsq-nju/Qwen3-1.7B-EdgeRazor-1.58bit model is a highly quantized version of the Qwen3-1.7B base model, developed by zhangsq-nju using the EdgeRazor framework. This model features aggressive 1.58-bit quantization for all decoder layers and 4-bit for embedding and lm_head, significantly reducing its memory footprint. It is specifically designed for efficient deployment on edge devices, offering a balance between performance and extreme resource constraints. This model is ideal for applications requiring a very lightweight LLM with minimal computational overhead.
Loading preview...
Model Overview
The zhangsq-nju/Qwen3-1.7B-EdgeRazor-1.58bit is a highly optimized, quantized version of the Qwen3-1.7B large language model, developed by zhangsq-nju using their EdgeRazor framework. The primary innovation lies in its aggressive mixed-precision quantization strategy, which targets extreme efficiency for resource-constrained environments.
Key Quantization Details
- Base Model: Qwen/Qwen3-1.7B
- Quantization Method: EdgeRazor framework, which employs a mixed-precision quantization-aware distillation approach.
- Bit-Widths: Achieves a remarkable 1.58-bit quantization for all decoder layers, while embedding and
lm_headlayers are quantized to 4-bit. This represents the most aggressive quantization recipe offered by EdgeRazor for this model.
Performance Characteristics
While significantly reducing the model's size and computational requirements, the 1.58-bit EdgeRazor variant shows a performance trade-off compared to the full-precision Qwen3-1.7B. For instance, its average score across various benchmarks (including ARC-e, HellaS., MMLU, GSM8K, HumanE.) is 43.89%, compared to the base model's 58.64%. This indicates its suitability for scenarios where extreme efficiency is prioritized over peak accuracy.
Use Cases
This model is particularly well-suited for:
- Edge Device Deployment: Its ultra-low bit-width makes it ideal for running LLM inference on devices with very limited memory and processing power.
- Resource-Constrained Applications: Scenarios where a lightweight model is critical, such as mobile applications, embedded systems, or IoT devices.
- Research in Extreme Quantization: Provides a practical example of highly quantized LLMs for further study and development in efficient AI.