MilyaShams/Qwen3-1.7B-SparseGPT_4_8
MilyaShams/Qwen3-1.7B-SparseGPT_4_8 is a 1.7 billion parameter language model based on the Qwen3 architecture, compressed using the llmcompressor framework. This model utilizes SparseGPT with a 4:8 sparsity mask, achieving a 50% sparsity level. It is optimized for efficient deployment and inference in resource-constrained environments while maintaining performance characteristics of its base Qwen3-1.7B model.
Loading preview...
Model Overview
MilyaShams/Qwen3-1.7B-SparseGPT_4_8 is a compressed version of the Qwen3-1.7B language model, developed by MilyaShams. This model has been optimized for efficiency through the application of the llmcompressor framework, specifically utilizing the SparseGPT technique.
Compression Details
The compression process involved applying a 4:8 sparsity mask, resulting in a 50% sparsity level. This method targets Linear layers within the Qwen3DecoderLayer components, aiming to reduce the model's computational footprint and memory requirements without significant performance degradation. The base model for this compression was Qwen/Qwen3-1.7B.
Key Characteristics
- Base Architecture: Qwen3-1.7B
- Parameter Count: Approximately 1.7 billion parameters (pre-compression)
- Compression Method: SparseGPT with a 4:8 mask structure
- Sparsity: 50% sparsity applied to target layers
- Context Length: 32768 tokens (inherited from base model)
Use Cases
This model is particularly well-suited for applications where computational resources are limited, such as edge devices or scenarios requiring high-throughput inference. Its compressed nature makes it an efficient choice for tasks that benefit from the Qwen3 architecture's capabilities but demand a smaller, faster model footprint.