zhangsq-nju/Qwen3-1.7B-EdgeRazor-4bit
The zhangsq-nju/Qwen3-1.7B-EdgeRazor-4bit model is a 4-bit quantized version of the Qwen3-1.7B base model, developed by zhangsq-nju using the EdgeRazor framework. This model is specifically optimized for efficient deployment on edge devices through mixed-precision quantization, maintaining strong performance with a 4-bit quantization scheme across all layers. It is designed for lightweight applications where reduced memory footprint and faster inference are critical, making it suitable for resource-constrained environments.
Loading preview...
Model Overview
This model, zhangsq-nju/Qwen3-1.7B-EdgeRazor-4bit, is a 4-bit quantized variant of the Qwen3-1.7B base model. It was developed by zhangsq-nju utilizing the EdgeRazor framework, which focuses on mixed-precision quantization-aware distillation to create lightweight yet performant large language models. This specific repository provides a version where all embedding, decoder, and lm_head layers are quantized to 4-bit precision.
Key Capabilities
- Efficient Quantization: Achieves a 4-bit quantization across the entire model, significantly reducing its memory footprint and computational requirements.
- Performance Retention: Benchmarks indicate that the 4-bit EdgeRazor variant (4-16-16 configuration) maintains an average performance of 58.56 across various tasks, closely matching the 16-bit Qwen3-1.7B base model's 58.64 average.
- Edge Deployment: Designed for scenarios requiring highly efficient LLMs, such as deployment on edge devices with limited resources.
- Mixed-Precision Options: While this repository focuses on the 4-bit version, the EdgeRazor framework supports various mixed-precision recipes, including configurations down to 1.58-bit.
Good For
- Resource-Constrained Environments: Ideal for applications on devices with limited memory and processing power.
- Fast Inference: The reduced bit-width contributes to faster inference speeds.
- Maintaining Performance: Suitable for use cases where near-original model performance is required but with significant efficiency gains.
- Research in Quantization: Provides a practical example of the EdgeRazor framework's application for efficient LLM deployment.