MilyaShams/Qwen3-1.7B-SparseGPT_unstruct_0.7_bs128_damp0.1
MilyaShams/Qwen3-1.7B-SparseGPT_unstruct_0.7_bs128_damp0.1 is a 2 billion parameter language model based on the Qwen3-1.7B architecture, compressed using the llmcompressor framework. This model features a 32768 token context length and incorporates SparseGPT unstructured sparsity at a 70% ratio. It is optimized for efficient deployment in scenarios where reduced model size and faster inference are critical.
Loading preview...
Overview
This model, MilyaShams/Qwen3-1.7B-SparseGPT_unstruct_0.7_bs128_damp0.1, is a compressed version of the Qwen/Qwen3-1.7B base model. It has approximately 2 billion parameters and maintains a substantial context length of 32768 tokens. The compression was performed using the llmcompressor framework, specifically employing the SparseGPT method.
Compression Details
- Base Model: Qwen/Qwen3-1.7B
- Compression Method: SparseGPT with unstructured sparsity.
- Sparsity Ratio: 70% (meaning 70% of the model's weights have been pruned).
- Block Size: 128
- Dampening Fraction: 0.1
Key Characteristics
- Reduced Size: Significant reduction in model size due to 70% sparsity, leading to lower memory footprint.
- Efficient Inference: Designed for faster inference speeds compared to its dense counterpart, making it suitable for resource-constrained environments.
- High Context Length: Retains the original 32768 token context window, allowing for processing long inputs.
Potential Use Cases
This model is particularly well-suited for applications requiring:
- Edge device deployment: Where computational resources and memory are limited.
- High-throughput inference: For scenarios demanding quick responses from the model.
- Cost-effective solutions: Reducing operational costs associated with larger, dense models.
It offers a balance between performance and efficiency, making it a strong candidate for various NLP tasks where the full capacity of a dense model might be overkill.