LiamCarter/icl-pruning-wanda-sparsity-0.5

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Apr 23, 2026Architecture:Transformer Cold

LiamCarter/icl-pruning-wanda-sparsity-0.5 is a 7 billion parameter language model derived from meta-llama/Llama-2-7b-hf, utilizing the Wanda pruning method with a 0.5 sparsity level. This model represents an experimental checkpoint focused on exploring the effects of structured sparsity on large language models. It is designed for research into model compression and efficiency, particularly for understanding how pruning impacts performance and resource utilization.

Loading preview...

Overview

This model, LiamCarter/icl-pruning-wanda-sparsity-0.5, is an experimental variant of the meta-llama/Llama-2-7b-hf base model. It incorporates the Wanda pruning method with a 0.5 sparsity level, indicating that approximately half of its parameters have been pruned. The repository contains a transformers-checkpoint format, making it compatible with standard Hugging Face loading procedures, though some experimental bundles may require custom code.

Key Characteristics

  • Base Model: meta-llama/Llama-2-7b-hf
  • Pruning Method: Wanda
  • Sparsity: 0.5 (50% of parameters pruned)
  • Format: Standard Hugging Face transformers-checkpoint
  • Purpose: Research into model compression and the impact of pruning on LLM performance.

Potential Use Cases

  • Research on Model Sparsity: Ideal for studying the effects of the Wanda pruning technique.
  • Efficiency Experiments: Useful for evaluating how a 50% sparsity level influences inference speed and memory footprint.
  • Comparative Analysis: Can be used to compare the performance of a pruned Llama-2-7b model against its dense counterpart or other pruning methods.