MilyaShams/Qwen3-1.7B-Wanda_4_8

TEXT GENERATIONConcurrency Cost:1Model Size:2BQuant:BF16Ctx Length:32kPublished:Apr 8, 2026Architecture:Transformer Cold

MilyaShams/Qwen3-1.7B-Wanda_4_8 is a 1.7 billion parameter language model based on the Qwen3 architecture, compressed using the llmcompressor framework. This model features a 32768 token context length and utilizes a 4:8 sparsity mask structure, making it a more efficient version of the original Qwen3-1.7B. It is optimized for deployment scenarios where reduced model size and computational efficiency are critical.

Loading preview...

Overview

This model, MilyaShams/Qwen3-1.7B-Wanda_4_8, is a compressed version of the Qwen/Qwen3-1.7B base model, developed by MilyaShams. It leverages the llmcompressor framework to achieve a more efficient footprint while retaining the core capabilities of the original 1.7 billion parameter Qwen3 architecture. The compression process specifically applied a Wanda_4_8 recipe, which includes a 4:8 mask structure for sparsity, targeting Linear layers within Qwen3DecoderLayer modules.

Key Characteristics

  • Base Model: Qwen/Qwen3-1.7B, a 1.7 billion parameter language model.
  • Compression Method: Utilizes the llmcompressor framework with a Wanda_4_8 recipe.
  • Sparsity: Implements a 4:8 mask structure, indicating a specific pattern of weight pruning for efficiency.
  • Context Length: Maintains the original 32768 token context window, suitable for processing longer sequences.

Use Cases

This compressed model is particularly well-suited for applications requiring:

  • Efficient Deployment: Its reduced size and optimized structure make it ideal for environments with limited computational resources or strict latency requirements.
  • Edge Devices: Potentially beneficial for deployment on edge devices or mobile applications where smaller models are preferred.
  • Research in Model Compression: Serves as a practical example of applying structured sparsity techniques to large language models.