zhangsq-nju/Qwen3-1.7B-EdgeRazor-4bit

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:2BQuant:BF16Ctx Length:32kPublished:Apr 13, 2026License:apache-2.0Architecture:Transformer Open Weights Warm

The zhangsq-nju/Qwen3-1.7B-EdgeRazor-4bit model is a 4-bit quantized version of the Qwen3-1.7B base model, developed by zhangsq-nju using the EdgeRazor framework. This model is specifically optimized for efficient deployment on edge devices through mixed-precision quantization, maintaining strong performance with a 4-bit quantization scheme across all layers. It is designed for lightweight applications where reduced memory footprint and faster inference are critical, making it suitable for resource-constrained environments.

Loading preview...

Model Overview

This model, zhangsq-nju/Qwen3-1.7B-EdgeRazor-4bit, is a 4-bit quantized variant of the Qwen3-1.7B base model. It was developed by zhangsq-nju utilizing the EdgeRazor framework, which focuses on mixed-precision quantization-aware distillation to create lightweight yet performant large language models. This specific repository provides a version where all embedding, decoder, and lm_head layers are quantized to 4-bit precision.

Key Capabilities

  • Efficient Quantization: Achieves a 4-bit quantization across the entire model, significantly reducing its memory footprint and computational requirements.
  • Performance Retention: Benchmarks indicate that the 4-bit EdgeRazor variant (4-16-16 configuration) maintains an average performance of 58.56 across various tasks, closely matching the 16-bit Qwen3-1.7B base model's 58.64 average.
  • Edge Deployment: Designed for scenarios requiring highly efficient LLMs, such as deployment on edge devices with limited resources.
  • Mixed-Precision Options: While this repository focuses on the 4-bit version, the EdgeRazor framework supports various mixed-precision recipes, including configurations down to 1.58-bit.

Good For

  • Resource-Constrained Environments: Ideal for applications on devices with limited memory and processing power.
  • Fast Inference: The reduced bit-width contributes to faster inference speeds.
  • Maintaining Performance: Suitable for use cases where near-original model performance is required but with significant efficiency gains.
  • Research in Quantization: Provides a practical example of the EdgeRazor framework's application for efficient LLM deployment.