Name: wang7776/Llama-2-7b-chat-hf-10-sparsity API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: wang7776

Overview

This model, wang7776/Llama-2-7b-chat-hf-10-sparsity, is a 7 billion parameter variant of Meta's Llama 2 Chat model. It has been specifically modified to achieve 10% sparsity using the Wanda pruning method. This technique allows for a reduction in model size without requiring additional retraining or weight updates, aiming to preserve performance efficiency.

Key Capabilities & Features

Base Model: Built upon the Llama 2 7B Chat model, which is optimized for dialogue use cases.
Sparsity: Incorporates 10% sparsity via Wanda pruning, potentially offering benefits in terms of inference speed or memory footprint.
Architecture: Utilizes an optimized transformer architecture, fine-tuned with supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) for helpfulness and safety.
Context Length: Supports a context length of 4096 tokens.
Language: Primarily intended for commercial and research use in English.

Intended Use Cases

Dialogue Applications: Optimized for assistant-like chat functionalities.
Research: Suitable for research into sparse models and their performance characteristics.
Commercial Use: Permitted under a custom commercial license from Meta.

Overview

Overview

Key Capabilities & Features

Intended Use Cases

Full Model Card (README)