MilyaShams/Qwen3-1.7B-Wanda_unstruct_0.4

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:2BQuant:BF16Ctx Length:32kPublished:Apr 8, 2026Architecture:Transformer Warm

MilyaShams/Qwen3-1.7B-Wanda_unstruct_0.4 is a 1.7 billion parameter language model based on the Qwen3 architecture, compressed using the llmcompressor framework. This model has undergone unstructured pruning with a 40% sparsity level, targeting linear layers for efficiency. It is designed for applications requiring a smaller, more efficient model while retaining capabilities derived from its Qwen3 base.

Loading preview...

Overview

This model, MilyaShams/Qwen3-1.7B-Wanda_unstruct_0.4, is a compressed version of the Qwen/Qwen3-1.7B base model. It was created using the llmcompressor framework, specifically employing the Wanda_unstruct_0.4 experiment recipe.

Compression Details

The compression process involved applying a 40% unstructured sparsity to the model's linear layers. This technique aims to reduce the model's size and computational requirements by removing a significant portion of its parameters without a predefined structure, potentially making it more efficient for deployment in resource-constrained environments.

Key Characteristics

  • Base Model: Qwen3-1.7B, indicating its foundational architecture and initial capabilities.
  • Parameter Count: Approximately 1.7 billion parameters, making it a relatively compact model.
  • Compression Method: Unstructured pruning with a 40% sparsity level, applied to Linear layers within the Qwen3DecoderLayer.
  • Framework: Compressed using llmcompressor, a framework designed for model optimization.

Potential Use Cases

This compressed model is suitable for scenarios where:

  • Resource efficiency is critical, such as edge devices or applications with strict memory/compute budgets.
  • Faster inference is desired due to the reduced parameter count.
  • Leveraging the capabilities of the Qwen3 architecture in a more lightweight package is beneficial.