SipsaLabs/qwen3-1.7b-uc2p79

TEXT GENERATIONConcurrency Cost:1Model Size:2BQuant:BF16Ctx Length:32kPublished:Apr 29, 2026Architecture:Transformer Cold

SipsaLabs/qwen3-1.7b-uc2p79 is a 1.7 billion parameter Qwen3-based language model from Sipsa Labs, Inc. that features patent-pending UltraCompress row-overlay quantization, achieving an effective 2.7767 bits per weight. This model is optimized for efficient inference in edge and on-device deployments, offering significant compression while maintaining high quality. It is designed for research and evaluation, providing a highly compressed variant of the Qwen3-1.7B base model.

Loading preview...

UltraCompress Qwen3-1.7B: Highly Compressed LLM for Efficient Inference

SipsaLabs/qwen3-1.7b-uc2p79 is a compressed variant of the Qwen/Qwen3-1.7B model, developed by Sipsa Labs, Inc. It utilizes their patent-pending UltraCompress low-rank correction overlay method, achieving an impressive 2.7767 bits per weight (bpw). This results in a significantly smaller model footprint, with the packed binary (model.uc.bin) being approximately 491 MB compared to the FP16 reconstruction at ~3.3 GB.

Key Capabilities

  • Extreme Compression: Achieves sub-3 bpw compression using row-overlay quantization, making it highly efficient for resource-constrained environments.
  • Quality Retention: Demonstrates non-catastrophic failure across a 6-model cohort, with Qwen3-1.7B showing 93.81% T1 retention versus its FP16 baseline on WikiText-103 perplexity. Benchmarks on HellaSwag show statistically indistinguishable performance from the FP16 baseline at n=200.
  • Scalable Retention: UltraCompress's retention scales positively with model size, showing a 2.2x steeper scaling slope compared to bitsandbytes NF4.
  • Dual Format Support: Available as a standard model.safetensors for transformers compatibility and a highly packed model.uc.bin for use with the ultracompress runtime.

Good for

  • Edge and On-Device Deployments: Ideal for applications requiring minimal memory footprint and fast inference on constrained hardware.
  • Research and Evaluation: Provides a robust platform for studying the impact of advanced quantization techniques on LLM performance.
  • Pre-purchase Evaluation: Enterprises can use this model for evaluating UltraCompress technology before considering a commercial license.

This model is intended for research and evaluation purposes, with specific licensing terms for commercial use. It inherits the base model's characteristics and limitations, and users should conduct their own evaluations before production deployment.