MilyaShams/Qwen3-1.7B-Wanda_1_4

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:2BQuant:BF16Ctx Length:32kPublished:Apr 8, 2026Architecture:Transformer Warm

MilyaShams/Qwen3-1.7B-Wanda_1_4 is a 1.7 billion parameter language model based on the Qwen3 architecture, specifically the Qwen/Qwen3-1.7B base model. This model has undergone compression using the llmcompressor framework with the Wanda_1_4 recipe, applying a 1:4 mask structure for sparsity. It is optimized for efficient deployment and inference while maintaining core language capabilities.

Loading preview...

Model Overview

MilyaShams/Qwen3-1.7B-Wanda_1_4 is a compressed version of the Qwen/Qwen3-1.7B language model, featuring approximately 1.7 billion parameters. This model was processed using the llmcompressor framework, which aims to reduce model size and computational requirements while preserving performance.

Compression Details

The compression process, identified as the Wanda_1_4 experiment, involved applying specific modifiers to the base Qwen3-1.7B model. Key aspects of this compression include:

  • Sparsity: A sparsity level of 0.25 was applied.
  • Mask Structure: The compression utilized a 1:4 mask structure, indicating a specific pattern of weight pruning or quantization.
  • Target Layers: The compression primarily targeted Linear layers within the model, with sequential updates applied to Qwen3DecoderLayer components.

Use Cases

This compressed model is particularly suitable for scenarios where computational resources are constrained, such as edge devices or applications requiring faster inference times. Its reduced footprint makes it an efficient choice for tasks that benefit from a smaller, optimized language model derived from the Qwen3 architecture.