Name: MilyaShams/Qwen3-1.7B-SparseGPT_4_8 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: MilyaShams

Model Overview

MilyaShams/Qwen3-1.7B-SparseGPT_4_8 is a compressed version of the Qwen3-1.7B language model, developed by MilyaShams. This model has been optimized for efficiency through the application of the llmcompressor framework, specifically utilizing the SparseGPT technique.

Compression Details

The compression process involved applying a 4:8 sparsity mask, resulting in a 50% sparsity level. This method targets Linear layers within the Qwen3DecoderLayer components, aiming to reduce the model's computational footprint and memory requirements without significant performance degradation. The base model for this compression was Qwen/Qwen3-1.7B.

Key Characteristics

Base Architecture: Qwen3-1.7B
Parameter Count: Approximately 1.7 billion parameters (pre-compression)
Compression Method: SparseGPT with a 4:8 mask structure
Sparsity: 50% sparsity applied to target layers
Context Length: 32768 tokens (inherited from base model)

Use Cases

This model is particularly well-suited for applications where computational resources are limited, such as edge devices or scenarios requiring high-throughput inference. Its compressed nature makes it an efficient choice for tasks that benefit from the Qwen3 architecture's capabilities but demand a smaller, faster model footprint.

Overview

Model Overview

Compression Details

Key Characteristics

Use Cases

Full Model Card (README)