LiamCarter/icl-pruning-wanda-sparsity-0.4

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Apr 23, 2026Architecture:Transformer Cold

LiamCarter/icl-pruning-wanda-sparsity-0.4 is a 7 billion parameter language model based on the Llama-2-7b-hf architecture, developed by LiamCarter. This model has been pruned using the Wanda method with a sparsity of 0.4, indicating a focus on efficient inference through model compression. It is designed for applications requiring a smaller, more optimized model while retaining capabilities derived from its Llama-2 base.

Loading preview...

Overview

This model, LiamCarter/icl-pruning-wanda-sparsity-0.4, is a 7 billion parameter variant derived from the meta-llama/Llama-2-7b-hf base model. It has undergone pruning using the Wanda method with a sparsity level of 0.4. This process aims to reduce the model's size and computational requirements while maintaining performance.

Key Characteristics

  • Base Model: meta-llama/Llama-2-7b-hf
  • Pruning Method: Wanda
  • Sparsity: 0.4 (meaning 40% of the weights have been removed or set to zero)
  • Format: Standard Hugging Face transformers-checkpoint

Potential Use Cases

This pruned model is particularly suitable for scenarios where:

  • Efficient Inference: Reduced model size and computational load are critical.
  • Resource-Constrained Environments: Deployment on devices with limited memory or processing power.
  • Exploration of Pruning Techniques: Researchers and developers interested in the impact of Wanda pruning on Llama-2 models.