MilyaShams/Qwen3-1.7B-Wanda_4_8
MilyaShams/Qwen3-1.7B-Wanda_4_8 is a 1.7 billion parameter language model based on the Qwen3 architecture, compressed using the llmcompressor framework. This model features a 32768 token context length and utilizes a 4:8 sparsity mask structure, making it a more efficient version of the original Qwen3-1.7B. It is optimized for deployment scenarios where reduced model size and computational efficiency are critical.
Loading preview...
Overview
This model, MilyaShams/Qwen3-1.7B-Wanda_4_8, is a compressed version of the Qwen/Qwen3-1.7B base model, developed by MilyaShams. It leverages the llmcompressor framework to achieve a more efficient footprint while retaining the core capabilities of the original 1.7 billion parameter Qwen3 architecture. The compression process specifically applied a Wanda_4_8 recipe, which includes a 4:8 mask structure for sparsity, targeting Linear layers within Qwen3DecoderLayer modules.
Key Characteristics
- Base Model: Qwen/Qwen3-1.7B, a 1.7 billion parameter language model.
- Compression Method: Utilizes the
llmcompressorframework with a Wanda_4_8 recipe. - Sparsity: Implements a
4:8mask structure, indicating a specific pattern of weight pruning for efficiency. - Context Length: Maintains the original 32768 token context window, suitable for processing longer sequences.
Use Cases
This compressed model is particularly well-suited for applications requiring:
- Efficient Deployment: Its reduced size and optimized structure make it ideal for environments with limited computational resources or strict latency requirements.
- Edge Devices: Potentially beneficial for deployment on edge devices or mobile applications where smaller models are preferred.
- Research in Model Compression: Serves as a practical example of applying structured sparsity techniques to large language models.