Name: wang7776/Llama-2-7b-chat-hf-10-attention-sparsity API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: wang7776

Overview

This model is a 7 billion parameter variant of Meta's Llama 2-Chat, specifically optimized for dialogue. Its key differentiator is the application of 10% attention sparsity using the Wanda pruning method. This technique allows for a reduction in model size and potentially faster inference without requiring additional retraining or weight updates, while still aiming for competitive performance compared to its dense counterpart.

Key Capabilities

Dialogue Optimization: Fine-tuned for assistant-like chat applications.
Efficient Inference: Benefits from 10% attention sparsity, potentially leading to reduced computational requirements.
Llama 2 Foundation: Built upon the robust Llama 2 architecture, which uses an optimized transformer and incorporates supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) for helpfulness and safety.
4096-token Context: Supports a standard context window for conversational tasks.

Good For

Developers seeking a more efficient version of Llama 2-7b-chat for dialogue generation.
Applications where computational resources are a concern, and a balance between performance and efficiency is desired.
Research into the effects and benefits of pruning techniques like Wanda on large language models.

Overview

Overview

Key Capabilities

Good For

Full Model Card (README)