Name: MilyaShams/Qwen3-1.7B-SparseGPT_unstruct_0.7_bs128_damp0.1 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: MilyaShams

Overview

This model, MilyaShams/Qwen3-1.7B-SparseGPT_unstruct_0.7_bs128_damp0.1, is a compressed version of the Qwen/Qwen3-1.7B base model. It has approximately 2 billion parameters and maintains a substantial context length of 32768 tokens. The compression was performed using the llmcompressor framework, specifically employing the SparseGPT method.

Compression Details

Base Model: Qwen/Qwen3-1.7B
Compression Method: SparseGPT with unstructured sparsity.
Sparsity Ratio: 70% (meaning 70% of the model's weights have been pruned).
Block Size: 128
Dampening Fraction: 0.1

Key Characteristics

Reduced Size: Significant reduction in model size due to 70% sparsity, leading to lower memory footprint.
Efficient Inference: Designed for faster inference speeds compared to its dense counterpart, making it suitable for resource-constrained environments.
High Context Length: Retains the original 32768 token context window, allowing for processing long inputs.

Potential Use Cases

This model is particularly well-suited for applications requiring:

Edge device deployment: Where computational resources and memory are limited.
High-throughput inference: For scenarios demanding quick responses from the model.
Cost-effective solutions: Reducing operational costs associated with larger, dense models.

It offers a balance between performance and efficiency, making it a strong candidate for various NLP tasks where the full capacity of a dense model might be overkill.

Overview

Overview

Compression Details

Key Characteristics

Potential Use Cases

Full Model Card (README)