Name: zhangsq-nju/Qwen3-0.6B-EdgeRazor-4bit API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: zhangsq-nju

Model Overview

This model, zhangsq-nju/Qwen3-0.6B-EdgeRazor-4bit, is a 0.6 billion parameter language model derived from the Qwen/Qwen3-0.6B base model. It has been fine-tuned and quantized using the EdgeRazor framework, developed by zhangsq-nju, to achieve high efficiency for edge deployments.

Key Features & Quantization

Base Model: Qwen/Qwen3-0.6B.
Quantization: Utilizes a 4-bit mixed-precision quantization scheme, applying 4-bit quantization to all embedding, decoder, and lm_head layers. This is the most aggressive 4-bit configuration offered by EdgeRazor, aiming for maximum compression.
Performance: Despite aggressive 4-bit quantization, the model demonstrates competitive performance across various benchmarks compared to the original 16-bit Qwen3-0.6B. For instance, the 4-bit EdgeRazor (4-16-16) achieves an average score of 47.83, slightly surpassing the base Qwen3-0.6B's 47.35 in the provided benchmarks.

Use Cases

This model is particularly well-suited for scenarios requiring:

Resource-constrained environments: Ideal for deployment on edge devices, mobile applications, or embedded systems where memory and computational power are limited.
Efficient inference: The 4-bit quantization significantly reduces the model's footprint and accelerates inference speed.
Lightweight LLM applications: Suitable for tasks where a smaller, faster model is preferred over larger, more computationally intensive alternatives, while maintaining reasonable performance.

Overview

Model Overview

Key Features & Quantization

Use Cases

Full Model Card (README)