Name: SipsaLabs/qwen3-1.7b-uc2p79 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: SipsaLabs

UltraCompress Qwen3-1.7B: Highly Compressed LLM for Efficient Inference

SipsaLabs/qwen3-1.7b-uc2p79 is a compressed variant of the Qwen/Qwen3-1.7B model, developed by Sipsa Labs, Inc. It utilizes their patent-pending UltraCompress low-rank correction overlay method, achieving an impressive 2.7767 bits per weight (bpw). This results in a significantly smaller model footprint, with the packed binary (model.uc.bin) being approximately 491 MB compared to the FP16 reconstruction at ~3.3 GB.

Key Capabilities

Extreme Compression: Achieves sub-3 bpw compression using row-overlay quantization, making it highly efficient for resource-constrained environments.
Quality Retention: Demonstrates non-catastrophic failure across a 6-model cohort, with Qwen3-1.7B showing 93.81% T1 retention versus its FP16 baseline on WikiText-103 perplexity. Benchmarks on HellaSwag show statistically indistinguishable performance from the FP16 baseline at n=200.
Scalable Retention: UltraCompress's retention scales positively with model size, showing a 2.2x steeper scaling slope compared to bitsandbytes NF4.
Dual Format Support: Available as a standard model.safetensors for transformers compatibility and a highly packed model.uc.bin for use with the ultracompress runtime.

Good for

Edge and On-Device Deployments: Ideal for applications requiring minimal memory footprint and fast inference on constrained hardware.
Research and Evaluation: Provides a robust platform for studying the impact of advanced quantization techniques on LLM performance.
Pre-purchase Evaluation: Enterprises can use this model for evaluating UltraCompress technology before considering a commercial license.

This model is intended for research and evaluation purposes, with specific licensing terms for commercial use. It inherits the base model's characteristics and limitations, and users should conduct their own evaluations before production deployment.

Overview

UltraCompress Qwen3-1.7B: Highly Compressed LLM for Efficient Inference

Key Capabilities

Good for

Full Model Card (README)