Name: zhangsq-nju/Qwen3-1.7B-EdgeRazor-1.58bit API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: zhangsq-nju

Model Overview

The zhangsq-nju/Qwen3-1.7B-EdgeRazor-1.58bit is a highly optimized, quantized version of the Qwen3-1.7B large language model, developed by zhangsq-nju using their EdgeRazor framework. The primary innovation lies in its aggressive mixed-precision quantization strategy, which targets extreme efficiency for resource-constrained environments.

Key Quantization Details

Base Model: Qwen/Qwen3-1.7B
Quantization Method: EdgeRazor framework, which employs a mixed-precision quantization-aware distillation approach.
Bit-Widths: Achieves a remarkable 1.58-bit quantization for all decoder layers, while embedding and lm_head layers are quantized to 4-bit. This represents the most aggressive quantization recipe offered by EdgeRazor for this model.

Performance Characteristics

While significantly reducing the model's size and computational requirements, the 1.58-bit EdgeRazor variant shows a performance trade-off compared to the full-precision Qwen3-1.7B. For instance, its average score across various benchmarks (including ARC-e, HellaS., MMLU, GSM8K, HumanE.) is 43.89%, compared to the base model's 58.64%. This indicates its suitability for scenarios where extreme efficiency is prioritized over peak accuracy.

Use Cases

This model is particularly well-suited for:

Edge Device Deployment: Its ultra-low bit-width makes it ideal for running LLM inference on devices with very limited memory and processing power.
Resource-Constrained Applications: Scenarios where a lightweight model is critical, such as mobile applications, embedded systems, or IoT devices.
Research in Extreme Quantization: Provides a practical example of highly quantized LLMs for further study and development in efficient AI.

Overview

Model Overview

Key Quantization Details

Performance Characteristics

Use Cases

Full Model Card (README)