zhangsq-nju/Qwen3-0.6B-EdgeRazor-4bit
The zhangsq-nju/Qwen3-0.6B-EdgeRazor-4bit is a 0.6 billion parameter language model based on the Qwen3 architecture, developed by zhangsq-nju. This model is specifically optimized for efficient deployment on edge devices through 4-bit mixed-precision quantization across all embedding, decoder, and lm_head layers. It aims to provide competitive performance for lightweight LLM applications while significantly reducing memory footprint and computational requirements.
Loading preview...
Model Overview
This model, zhangsq-nju/Qwen3-0.6B-EdgeRazor-4bit, is a 0.6 billion parameter language model derived from the Qwen/Qwen3-0.6B base model. It has been fine-tuned and quantized using the EdgeRazor framework, developed by zhangsq-nju, to achieve high efficiency for edge deployments.
Key Features & Quantization
- Base Model: Qwen/Qwen3-0.6B.
- Quantization: Utilizes a 4-bit mixed-precision quantization scheme, applying 4-bit quantization to all embedding, decoder, and
lm_headlayers. This is the most aggressive 4-bit configuration offered by EdgeRazor, aiming for maximum compression. - Performance: Despite aggressive 4-bit quantization, the model demonstrates competitive performance across various benchmarks compared to the original 16-bit Qwen3-0.6B. For instance, the 4-bit EdgeRazor (4-16-16) achieves an average score of 47.83, slightly surpassing the base Qwen3-0.6B's 47.35 in the provided benchmarks.
Use Cases
This model is particularly well-suited for scenarios requiring:
- Resource-constrained environments: Ideal for deployment on edge devices, mobile applications, or embedded systems where memory and computational power are limited.
- Efficient inference: The 4-bit quantization significantly reduces the model's footprint and accelerates inference speed.
- Lightweight LLM applications: Suitable for tasks where a smaller, faster model is preferred over larger, more computationally intensive alternatives, while maintaining reasonable performance.