LiamCarter/icl-pruning-wanda-sparsity-0.2

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Apr 23, 2026Architecture:Transformer Cold

The LiamCarter/icl-pruning-wanda-sparsity-0.2 model is a 7 billion parameter language model based on the meta-llama/Llama-2-7b-hf architecture. This variant utilizes the 'wanda' pruning method with a sparsity of 0.2, indicating a specific approach to model compression. It is presented as a transformers-checkpoint, preserving the original local experiment files. This model is primarily differentiated by its application of structured sparsity for potential efficiency gains.

Loading preview...

Model Overview

This model, LiamCarter/icl-pruning-wanda-sparsity-0.2, is a 7 billion parameter variant derived from the meta-llama/Llama-2-7b-hf base model. It incorporates the Wanda pruning method with a specified sparsity of 0.2. This indicates that the model has undergone a compression technique designed to reduce its size and computational requirements by removing a fraction of its parameters.

Key Characteristics

  • Base Architecture: meta-llama/Llama-2-7b-hf
  • Parameter Count: 7 billion
  • Pruning Method: Wanda, a technique for structured sparsity.
  • Sparsity Level: 0.2, implying 20% of parameters have been pruned.
  • Format: Standard Hugging Face transformers-checkpoint.

Potential Use Cases

This model is particularly relevant for researchers and developers interested in:

  • Efficient deployment: Exploring the performance of pruned models for reduced memory footprint and faster inference.
  • Sparsity research: Investigating the impact of the Wanda pruning method at a 0.2 sparsity level on Llama-2-7b.
  • Resource-constrained environments: Evaluating its suitability for applications where computational resources are limited, compared to the dense base model.