Name: MilyaShams/Qwen3-1.7B-SparseGPT_unstruct_0.6_bs64_damp0.05 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: MilyaShams

Overview

This model, MilyaShams/Qwen3-1.7B-SparseGPT_unstruct_0.6_bs64_damp0.05, is a compressed version of the Qwen/Qwen3-1.7B base model. It was created using the llmcompressor framework, specifically employing the SparseGPT compression technique. The primary goal of this compression is to reduce the model's size and computational requirements, making it more efficient for deployment in resource-constrained environments.

Compression Details

The compression process involved applying several modifiers to the base Qwen3-1.7B model. Key parameters for this specific compression experiment, named SparseGPT_unstruct_0.6_bs64_damp0.05, include:

Sparsity: An unstructured sparsity of 0.6 was applied, meaning 60% of the weights were pruned.
Block Size: A block_size of 64 was used during the SparseGPT process.
Dampening Fraction: A dampening_frac of 0.05 was set, which helps in stabilizing the compression.
Targets: The compression specifically targeted Linear layers within the Qwen3DecoderLayer modules.

Potential Use Cases

This compressed model is particularly suitable for applications where:

Resource Efficiency is Key: Its reduced size and potentially lower inference costs make it ideal for edge devices or environments with limited computational resources.
Faster Inference is Desired: Sparsity can lead to faster inference times compared to the dense base model.
General Language Tasks: While compressed, it aims to maintain capabilities for common natural language processing tasks, offering a balance between performance and efficiency.

Overview

Overview

Compression Details

Potential Use Cases

Full Model Card (README)