MilyaShams/Qwen3-1.7B-SparseGPT_unstruct_0.6_bs64_damp0.05

TEXT GENERATIONConcurrency Cost:1Model Size:2BQuant:BF16Ctx Length:32kPublished:Apr 8, 2026Architecture:Transformer Cold

MilyaShams/Qwen3-1.7B-SparseGPT_unstruct_0.6_bs64_damp0.05 is a 2 billion parameter language model based on the Qwen3-1.7B architecture, compressed using the llmcompressor framework. This model utilizes SparseGPT with an unstructured sparsity of 0.6, a block size of 64, and a dampening fraction of 0.05, making it a more efficient and potentially faster alternative to its base model. It is optimized for scenarios where reduced model size and computational cost are critical, while aiming to retain performance for general language tasks.

Loading preview...

Overview

This model, MilyaShams/Qwen3-1.7B-SparseGPT_unstruct_0.6_bs64_damp0.05, is a compressed version of the Qwen/Qwen3-1.7B base model. It was created using the llmcompressor framework, specifically employing the SparseGPT compression technique. The primary goal of this compression is to reduce the model's size and computational requirements, making it more efficient for deployment in resource-constrained environments.

Compression Details

The compression process involved applying several modifiers to the base Qwen3-1.7B model. Key parameters for this specific compression experiment, named SparseGPT_unstruct_0.6_bs64_damp0.05, include:

  • Sparsity: An unstructured sparsity of 0.6 was applied, meaning 60% of the weights were pruned.
  • Block Size: A block_size of 64 was used during the SparseGPT process.
  • Dampening Fraction: A dampening_frac of 0.05 was set, which helps in stabilizing the compression.
  • Targets: The compression specifically targeted Linear layers within the Qwen3DecoderLayer modules.

Potential Use Cases

This compressed model is particularly suitable for applications where:

  • Resource Efficiency is Key: Its reduced size and potentially lower inference costs make it ideal for edge devices or environments with limited computational resources.
  • Faster Inference is Desired: Sparsity can lead to faster inference times compared to the dense base model.
  • General Language Tasks: While compressed, it aims to maintain capabilities for common natural language processing tasks, offering a balance between performance and efficiency.