wang7776/Llama-2-7b-chat-hf-10-sparsity

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Dec 11, 2023License:otherArchitecture:Transformer Cold

wang7776/Llama-2-7b-chat-hf-10-sparsity is a 7 billion parameter Llama 2 Chat model developed by Meta, featuring 10% sparsity achieved through Wanda pruning without retraining. This optimization aims to maintain competitive performance while reducing model size. It is fine-tuned for dialogue use cases and optimized for chat applications in English, with a context length of 4096 tokens.

Loading preview...

Overview

This model, wang7776/Llama-2-7b-chat-hf-10-sparsity, is a 7 billion parameter variant of Meta's Llama 2 Chat model. It has been specifically modified to achieve 10% sparsity using the Wanda pruning method. This technique allows for a reduction in model size without requiring additional retraining or weight updates, aiming to preserve performance efficiency.

Key Capabilities & Features

  • Base Model: Built upon the Llama 2 7B Chat model, which is optimized for dialogue use cases.
  • Sparsity: Incorporates 10% sparsity via Wanda pruning, potentially offering benefits in terms of inference speed or memory footprint.
  • Architecture: Utilizes an optimized transformer architecture, fine-tuned with supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) for helpfulness and safety.
  • Context Length: Supports a context length of 4096 tokens.
  • Language: Primarily intended for commercial and research use in English.

Intended Use Cases

  • Dialogue Applications: Optimized for assistant-like chat functionalities.
  • Research: Suitable for research into sparse models and their performance characteristics.
  • Commercial Use: Permitted under a custom commercial license from Meta.