Name: MilyaShams/Qwen3-1.7B-Wanda_4_8 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: MilyaShams

Overview

This model, MilyaShams/Qwen3-1.7B-Wanda_4_8, is a compressed version of the Qwen/Qwen3-1.7B base model, developed by MilyaShams. It leverages the llmcompressor framework to achieve a more efficient footprint while retaining the core capabilities of the original 1.7 billion parameter Qwen3 architecture. The compression process specifically applied a Wanda_4_8 recipe, which includes a 4:8 mask structure for sparsity, targeting Linear layers within Qwen3DecoderLayer modules.

Key Characteristics

Base Model: Qwen/Qwen3-1.7B, a 1.7 billion parameter language model.
Compression Method: Utilizes the llmcompressor framework with a Wanda_4_8 recipe.
Sparsity: Implements a 4:8 mask structure, indicating a specific pattern of weight pruning for efficiency.
Context Length: Maintains the original 32768 token context window, suitable for processing longer sequences.

Use Cases

This compressed model is particularly well-suited for applications requiring:

Efficient Deployment: Its reduced size and optimized structure make it ideal for environments with limited computational resources or strict latency requirements.
Edge Devices: Potentially beneficial for deployment on edge devices or mobile applications where smaller models are preferred.
Research in Model Compression: Serves as a practical example of applying structured sparsity techniques to large language models.

Overview

Overview

Key Characteristics

Use Cases

Full Model Card (README)