Name: zhangsq-nju/Qwen3-1.7B-EdgeRazor-2.79bit API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: zhangsq-nju

Overview

zhangsq-nju/Qwen3-1.7B-EdgeRazor-2.79bit is a highly optimized, lightweight large language model derived from the Qwen3-1.7B base model. Developed by zhangsq-nju using their EdgeRazor framework, this model implements a mixed-precision quantization strategy to significantly reduce its memory footprint and computational requirements. Specifically, it utilizes a 2.79-bit quantization for all decoder layers, combined with 4-bit quantization for the embedding and lm_head components.

This model is designed for efficient deployment in resource-constrained environments, such as edge devices, where minimizing model size and maximizing inference speed are critical. The EdgeRazor framework employs quantization-aware distillation to achieve this balance, aiming to preserve performance while drastically reducing bit-width.

Key Capabilities

Highly Quantized: Achieves a 2.79-bit average precision across decoder layers, leading to a compact model size.
Efficient Inference: Optimized for faster execution and lower memory consumption on hardware with limited resources.
Performance Retention: Benchmarks show competitive performance compared to the full-precision Qwen3-1.7B, with an average score of 53.33 across various tasks for the 2.79-bit configuration.
Instruction-Tuned: Supports instruction-following for chat-based applications.

Good for

Edge Device Deployment: Ideal for running LLM applications directly on mobile phones, IoT devices, or other embedded systems.
Resource-Constrained Environments: Suitable for scenarios where GPU memory or computational power is limited.
Cost-Effective Inference: Reduces the operational costs associated with running large language models.
General Language Tasks: Capable of handling a wide range of natural language understanding and generation tasks efficiently.

Overview

Overview

Key Capabilities

Good for

Full Model Card (README)