zhangsq-nju/Qwen3-1.7B-EdgeRazor-1.58bit

TEXT GENERATIONConcurrency Cost:1Model Size:2BQuant:BF16Ctx Length:32kPublished:Apr 13, 2026License:apache-2.0Architecture:Transformer Open Weights Cold

The zhangsq-nju/Qwen3-1.7B-EdgeRazor-1.58bit model is a highly quantized version of the Qwen3-1.7B base model, developed by zhangsq-nju using the EdgeRazor framework. This model features aggressive 1.58-bit quantization for all decoder layers and 4-bit for embedding and lm_head, significantly reducing its memory footprint. It is specifically designed for efficient deployment on edge devices, offering a balance between performance and extreme resource constraints. This model is ideal for applications requiring a very lightweight LLM with minimal computational overhead.

Loading preview...

Model Overview

The zhangsq-nju/Qwen3-1.7B-EdgeRazor-1.58bit is a highly optimized, quantized version of the Qwen3-1.7B large language model, developed by zhangsq-nju using their EdgeRazor framework. The primary innovation lies in its aggressive mixed-precision quantization strategy, which targets extreme efficiency for resource-constrained environments.

Key Quantization Details

  • Base Model: Qwen/Qwen3-1.7B
  • Quantization Method: EdgeRazor framework, which employs a mixed-precision quantization-aware distillation approach.
  • Bit-Widths: Achieves a remarkable 1.58-bit quantization for all decoder layers, while embedding and lm_head layers are quantized to 4-bit. This represents the most aggressive quantization recipe offered by EdgeRazor for this model.

Performance Characteristics

While significantly reducing the model's size and computational requirements, the 1.58-bit EdgeRazor variant shows a performance trade-off compared to the full-precision Qwen3-1.7B. For instance, its average score across various benchmarks (including ARC-e, HellaS., MMLU, GSM8K, HumanE.) is 43.89%, compared to the base model's 58.64%. This indicates its suitability for scenarios where extreme efficiency is prioritized over peak accuracy.

Use Cases

This model is particularly well-suited for:

  • Edge Device Deployment: Its ultra-low bit-width makes it ideal for running LLM inference on devices with very limited memory and processing power.
  • Resource-Constrained Applications: Scenarios where a lightweight model is critical, such as mobile applications, embedded systems, or IoT devices.
  • Research in Extreme Quantization: Provides a practical example of highly quantized LLMs for further study and development in efficient AI.