Name: zhangsq-nju/Qwen3-1.7B-EdgeRazor-1.88bit API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: zhangsq-nju

Model Overview

This model, zhangsq-nju/Qwen3-1.7B-EdgeRazor-1.88bit, is a highly optimized, quantized version of the Qwen/Qwen3-1.7B base model. Developed by zhangsq-nju using the EdgeRazor framework, its primary innovation lies in its aggressive mixed-precision quantization strategy.

Key Quantization Details

Decoder Layers: Quantized to an ultra-low 1.88-bit precision.
Embedding and LM Head: Quantized to 4-bit precision.
This specific configuration (12.5% 4-bit + 87.5% 1.58-bit) results in an average bit-width of 1.88, as detailed in the EdgeRazor paper.

Performance Characteristics

While achieving significant compression, the 1.88-bit EdgeRazor model shows a reduction in average benchmark scores compared to the full-precision Qwen3-1.7B. For instance, the Qwen3-1.7B scores 58.64 on the aggregated benchmark, whereas the 1.88-bit EdgeRazor variant scores 47.14 (with 16-16-16 W-A-KV) or 47.03 (with 8-8-8 W-A-KV). This trade-off prioritizes extreme efficiency over peak accuracy, making it suitable for resource-constrained environments.

Ideal Use Cases

Edge Device Deployment: Designed for scenarios where computational resources and memory are severely limited.
Low-Latency Inference: Benefits applications requiring very fast response times due to its compact size.
Resource-Constrained AI: Suitable for integrating LLM capabilities into embedded systems or mobile applications where larger models are impractical.

Overview

Model Overview

Key Quantization Details

Performance Characteristics

Ideal Use Cases

Full Model Card (README)