Name: MilyaShams/Qwen3-1.7B-SparseGPT_unstruct_0.5_bs128_damp0.01 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: MilyaShams

Overview

This model, MilyaShams/Qwen3-1.7B-SparseGPT_unstruct_0.5_bs128_damp0.01, is a compressed version of the Qwen/Qwen3-1.7B base model. It was created using the llmcompressor framework, specifically employing the SparseGPT method to achieve significant model size reduction.

Compression Details

Base Model: Qwen/Qwen3-1.7B
Compression Method: SparseGPT
Sparsity: Achieves 50% unstructured sparsity.
Block Size: Compression was performed with a block size of 128.
Dampening Fraction: A dampening fraction of 0.01 was applied during the compression process.

Key Characteristics

Reduced Size: The primary benefit of this model is its smaller footprint due to the 50% unstructured sparsity, which can lead to lower memory consumption.
Efficient Inference: Compressed models typically offer faster inference speeds compared to their dense counterparts, making them suitable for latency-sensitive applications.
Qwen3 Architecture: Retains the foundational capabilities of the Qwen3-1.7B architecture, adapted for efficiency.

Use Cases

This model is particularly well-suited for scenarios where:

Resource Constraints: Deployment on edge devices or environments with limited computational resources.
Cost-Effectiveness: Reducing operational costs associated with model hosting and inference.
Faster Throughput: Applications requiring high-speed processing of language tasks.

Overview

Overview

Compression Details

Key Characteristics

Use Cases

Full Model Card (README)