nota-ai/st-vicuna-v1.3-5.5b-ppl
TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Mar 22, 2024Architecture:Transformer0.0K Cold
The nota-ai/st-vicuna-v1.3-5.5b-ppl model, developed by Nota AI, is a 5.5 billion parameter depth-pruned version of the Vicuna-v1.3-7B large language model. This model is optimized for efficient text generation by reducing its depth by 20% using a perplexity (PPL) criterion. It is designed for research and non-commercial projects requiring a more compact yet capable LLM derived from the Vicuna architecture.
Loading preview...
Shortened LLaMA: Efficient Depth-Pruned LLMs
The nota-ai/st-vicuna-v1.3-5.5b-ppl model is part of the Shortened LLaMA series developed by Nota AI, focusing on creating more efficient large language models through depth pruning. This specific model is a 5.5 billion parameter variant, derived from the Vicuna-v1.3-7B model by reducing its depth by 20%.
Key Capabilities & Features
- Efficient Text Generation: Achieved through a novel depth-pruning method that identifies and removes less important Transformer blocks.
- Pruning Method: Utilizes a one-shot pruning approach followed by light LoRA-based retraining, specifically employing a Perplexity (PPL) criterion for this model variant.
- Reduced Parameter Count: Offers a more compact model (5.5B parameters) compared to its 7B parameter base model, making it suitable for environments with resource constraints.
- Research-Oriented: Intended primarily for research and non-commercial projects, as indicated by its license.
Good For
- Resource-Constrained Deployments: Ideal for scenarios where a smaller model footprint is beneficial without a drastic reduction in performance.
- LLM Compression Research: Provides a practical example and benchmark for studying depth pruning techniques in large language models.
- Non-Commercial Applications: Suitable for academic research, personal projects, and other non-profit uses where the Vicuna-v1.3 architecture is desired in a more efficient form.