MilyaShams/Qwen3-1.7B-SparseGPT_unstruct_0.5_bs128_damp0.01

TEXT GENERATIONConcurrency Cost:1Model Size:2BQuant:BF16Ctx Length:32kPublished:Apr 8, 2026Architecture:Transformer Cold

MilyaShams/Qwen3-1.7B-SparseGPT_unstruct_0.5_bs128_damp0.01 is a 2 billion parameter Qwen3-based language model, compressed using the llmcompressor framework with SparseGPT. This model features 50% unstructured sparsity, achieved with a block size of 128 and a dampening fraction of 0.01, making it suitable for efficient deployment in resource-constrained environments. It is designed for general language tasks where reduced model size and faster inference are critical.

Loading preview...

Overview

This model, MilyaShams/Qwen3-1.7B-SparseGPT_unstruct_0.5_bs128_damp0.01, is a compressed version of the Qwen/Qwen3-1.7B base model. It was created using the llmcompressor framework, specifically employing the SparseGPT method to achieve significant model size reduction.

Compression Details

  • Base Model: Qwen/Qwen3-1.7B
  • Compression Method: SparseGPT
  • Sparsity: Achieves 50% unstructured sparsity.
  • Block Size: Compression was performed with a block size of 128.
  • Dampening Fraction: A dampening fraction of 0.01 was applied during the compression process.

Key Characteristics

  • Reduced Size: The primary benefit of this model is its smaller footprint due to the 50% unstructured sparsity, which can lead to lower memory consumption.
  • Efficient Inference: Compressed models typically offer faster inference speeds compared to their dense counterparts, making them suitable for latency-sensitive applications.
  • Qwen3 Architecture: Retains the foundational capabilities of the Qwen3-1.7B architecture, adapted for efficiency.

Use Cases

This model is particularly well-suited for scenarios where:

  • Resource Constraints: Deployment on edge devices or environments with limited computational resources.
  • Cost-Effectiveness: Reducing operational costs associated with model hosting and inference.
  • Faster Throughput: Applications requiring high-speed processing of language tasks.