Name: zhangsq-nju/Qwen3-0.6B-EdgeRazor-1.58bit API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: zhangsq-nju

EdgeRazor for Lightweight LLMs

This model, zhangsq-nju/Qwen3-0.6B-EdgeRazor-1.58bit, is a highly optimized, quantized version of the Qwen3-0.6B base model. Developed by zhangsq-nju using the EdgeRazor framework, it implements a mixed-precision quantization strategy to achieve extreme efficiency. Specifically, all decoder layers are quantized to a very low 1.58-bit, while the embedding and lm_head layers maintain 4-bit precision. This aggressive quantization significantly reduces the model's size and computational requirements, making it ideal for deployment on edge devices and in environments with limited resources.

Key Capabilities

Extreme Quantization: Achieves a 1.58-bit average bit-width across its core layers, drastically minimizing memory footprint.
Edge Deployment: Designed for efficient inference on resource-constrained hardware.
Instruction-Tuned: Optimized for instruct mode, as indicated by the enable_thinking=False setting in the quickstart example.
Performance-Efficiency Trade-off: Provides a balance between maintaining reasonable performance on various benchmarks (e.g., ARC-e, HellaS., MMLU) and achieving ultra-low bit-width.

Good for

Deploying LLMs on edge devices or embedded systems.
Applications requiring minimal memory and computational overhead.
Scenarios where a slight trade-off in raw performance is acceptable for significant efficiency gains.
Research and development in ultra-low-bit quantization for large language models.

Overview

EdgeRazor for Lightweight LLMs

Key Capabilities

Good for

Full Model Card (README)