Name: zhangsq-nju/Qwen3-1.7B-EdgeRazor-4bit API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: zhangsq-nju

Model Overview

This model, zhangsq-nju/Qwen3-1.7B-EdgeRazor-4bit, is a 4-bit quantized variant of the Qwen3-1.7B base model. It was developed by zhangsq-nju utilizing the EdgeRazor framework, which focuses on mixed-precision quantization-aware distillation to create lightweight yet performant large language models. This specific repository provides a version where all embedding, decoder, and lm_head layers are quantized to 4-bit precision.

Key Capabilities

Efficient Quantization: Achieves a 4-bit quantization across the entire model, significantly reducing its memory footprint and computational requirements.
Performance Retention: Benchmarks indicate that the 4-bit EdgeRazor variant (4-16-16 configuration) maintains an average performance of 58.56 across various tasks, closely matching the 16-bit Qwen3-1.7B base model's 58.64 average.
Edge Deployment: Designed for scenarios requiring highly efficient LLMs, such as deployment on edge devices with limited resources.
Mixed-Precision Options: While this repository focuses on the 4-bit version, the EdgeRazor framework supports various mixed-precision recipes, including configurations down to 1.58-bit.

Good For

Resource-Constrained Environments: Ideal for applications on devices with limited memory and processing power.
Fast Inference: The reduced bit-width contributes to faster inference speeds.
Maintaining Performance: Suitable for use cases where near-original model performance is required but with significant efficiency gains.
Research in Quantization: Provides a practical example of the EdgeRazor framework's application for efficient LLM deployment.

Overview

Model Overview

Key Capabilities

Good For

Full Model Card (README)