kettleguts/zephyr-7b-beta_sparse05

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Mar 24, 2024License:mitArchitecture:Transformer Open Weights Cold

kettleguts/zephyr-7b-beta_sparse05 is a 7 billion parameter language model, a pruned version of HuggingFaceH4/zephyr-7b-beta. It utilizes Wanda pruning to achieve 50% sparsity in its linear layers, making it a more efficient model for research purposes. This model is primarily intended for exploring the effects of network pruning on large language models, offering insights into model compression techniques. Its text generation quality is highly dependent on prompting, sometimes behaving like a smaller model due to heavy pruning.

Loading preview...

Model Overview

kettleguts/zephyr-7b-beta_sparse05 is a 7 billion parameter language model derived from HuggingFaceH4/zephyr-7b-beta. This version has undergone Wanda pruning, introducing 50% sparsity into its linear layers. The pruning technique is detailed in the paper "A Simple and Effective Pruning Approach for Large Language Models" (arXiv:2306.11695).

Key Characteristics

  • Pruned Architecture: Features 50% sparsity in linear layers through Wanda pruning, aiming for efficiency.
  • Research Focus: Primarily intended for research into model compression and the impact of pruning on LLMs.
  • Performance Considerations: Due to heavy pruning, its text generation quality is highly sensitive to prompting and may exhibit characteristics of a smaller model.
  • Environmental Impact: The pruning process for this model required less than 1 hour on a T4 GPU.

Intended Use Cases

  • Research: Ideal for studying network pruning, model sparsity, and their effects on large language models.
  • Experimentation: Suitable for exploring how heavily pruned models behave and can be optimized.

Limitations and Risks

  • Research Only: Not suitable for direct use in production or critical applications.
  • No Safeguards: Lacks built-in safeguards, inheriting potential biases and risks from its base model.
  • Quality Variability: Text generation quality can be inconsistent and highly dependent on the prompt.