Name: MilyaShams/Qwen3-1.7B-Wanda_unstruct_0.5 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: MilyaShams

Model Overview

This model, MilyaShams/Qwen3-1.7B-Wanda_unstruct_0.5, is a compressed version of the Qwen/Qwen3-1.7B base model. It was created using the llmcompressor framework, which applies various techniques to reduce model size and improve inference efficiency.

Compression Details

The primary differentiator of this model is its compression strategy. It utilizes the Wanda_unstruct_0.5 recipe, which introduces a sparsity of 0.5. This means approximately half of the model's parameters have been pruned, specifically targeting the Linear layers within the Qwen3DecoderLayer modules. The compression aims to achieve a smaller footprint and potentially faster inference without significant degradation in performance compared to the original 1.7 billion parameter Qwen3 model.

Key Characteristics

Base Architecture: Qwen3-1.7B
Parameter Count: Approximately 1.7 billion (pre-compression, effective size reduced by sparsity)
Compression Method: Wanda (Weight Agnostic Neural Decoder for Activation-based pruning)
Sparsity: 0.5 (50% unstructured sparsity)
Targeted Layers: Linear layers within Qwen3DecoderLayer

Potential Use Cases

This compressed model is suitable for applications where computational resources or deployment size are critical constraints. It can be particularly useful for:

Edge device deployment
Applications requiring faster inference times
Scenarios where a slightly reduced performance is acceptable for significant resource savings.

Overview

Model Overview

Compression Details

Key Characteristics

Potential Use Cases

Full Model Card (README)