MilyaShams/Qwen3-1.7B-Wanda_unstruct_0.6

TEXT GENERATIONConcurrency Cost:1Model Size:2BQuant:BF16Ctx Length:32kPublished:Apr 8, 2026Architecture:Transformer Cold

MilyaShams/Qwen3-1.7B-Wanda_unstruct_0.6 is a 1.7 billion parameter language model based on the Qwen3 architecture, compressed using the llmcompressor framework. This model features a 32768-token context length and has undergone unstructured sparsity compression at a 0.6 ratio using the Wanda method. It is optimized for efficient deployment in scenarios where reduced model size and computational cost are critical.

Loading preview...

Model Overview

MilyaShams/Qwen3-1.7B-Wanda_unstruct_0.6 is a compressed version of the Qwen/Qwen3-1.7B base model, developed by MilyaShams. This model has been optimized for efficiency through the application of the llmcompressor framework, specifically utilizing the Wanda unstructured sparsity method with a 0.6 sparsity ratio. This compression technique aims to reduce the model's size and computational requirements while retaining performance.

Key Characteristics

  • Base Architecture: Qwen3-1.7B
  • Parameter Count: Approximately 1.7 billion parameters
  • Context Length: Supports a substantial 32768-token context window
  • Compression Method: Wanda unstructured sparsity with a 0.6 ratio, applied to Linear layers within Qwen3DecoderLayer targets.

Use Cases

This model is particularly well-suited for applications requiring a powerful language model with a smaller footprint and faster inference. Its compression makes it ideal for:

  • Deployment on edge devices or environments with limited computational resources.
  • Scenarios where cost-effective scaling of LLM inference is a priority.
  • Tasks benefiting from a large context window, such as long-form content generation or complex document analysis, within a resource-constrained setting.