Name: zhangsq-nju/Qwen3-0.6B-EdgeRazor-1.88bit API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: zhangsq-nju

Model Overview: Qwen3-0.6B-EdgeRazor-1.88bit

This model is a highly quantized version of the Qwen3-0.6B base model, developed by zhangsq-nju using their EdgeRazor framework. Its primary differentiator is the aggressive mixed-precision quantization, specifically utilizing a 1.88-bit bit-width for all decoder layers and 4-bit for embedding and lm_head. This makes it exceptionally lightweight and suitable for environments with severe memory and computational constraints.

Key Capabilities & Features

Extreme Quantization: Achieves a 1.88-bit average bit-width for core components, significantly reducing model size and inference cost.
Edge Deployment: Optimized for deployment on resource-limited edge devices where traditional LLMs are impractical.
Performance Trade-off: While highly compressed, it maintains a competitive average performance of 41.76 across various benchmarks (ARC-e, HellaS., MMLU, GSM8K, etc.) compared to its less quantized counterparts, demonstrating effective quantization-aware distillation.
Easy Integration: Provides a straightforward transformers library quickstart for inference, including support for activation and KV cache quantization via trust_remote_code=True.

Should You Use This Model?

This model is ideal for use cases where:

Resource Constraints are Paramount: You need an LLM to run on devices with very limited memory or processing power.
Efficiency is Critical: Minimizing inference latency and energy consumption is a top priority.
Small Model Footprint: A compact model size is essential for deployment or distribution.

It's important to note that while performance is optimized for its size, there is an inherent trade-off compared to full-precision or less quantized models. Evaluate its benchmark scores against your specific task requirements.

Overview

Model Overview: Qwen3-0.6B-EdgeRazor-1.88bit

Key Capabilities & Features

Should You Use This Model?

Full Model Card (README)