nota-ai/st-vicuna-v1.3-5.5b-ppl

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Mar 22, 2024Architecture:Transformer0.0K Cold

The nota-ai/st-vicuna-v1.3-5.5b-ppl model, developed by Nota AI, is a 5.5 billion parameter depth-pruned version of the Vicuna-v1.3-7B large language model. This model is optimized for efficient text generation by reducing its depth by 20% using a perplexity (PPL) criterion. It is designed for research and non-commercial projects requiring a more compact yet capable LLM derived from the Vicuna architecture.

Loading preview...

Shortened LLaMA: Efficient Depth-Pruned LLMs

The nota-ai/st-vicuna-v1.3-5.5b-ppl model is part of the Shortened LLaMA series developed by Nota AI, focusing on creating more efficient large language models through depth pruning. This specific model is a 5.5 billion parameter variant, derived from the Vicuna-v1.3-7B model by reducing its depth by 20%.

Key Capabilities & Features

  • Efficient Text Generation: Achieved through a novel depth-pruning method that identifies and removes less important Transformer blocks.
  • Pruning Method: Utilizes a one-shot pruning approach followed by light LoRA-based retraining, specifically employing a Perplexity (PPL) criterion for this model variant.
  • Reduced Parameter Count: Offers a more compact model (5.5B parameters) compared to its 7B parameter base model, making it suitable for environments with resource constraints.
  • Research-Oriented: Intended primarily for research and non-commercial projects, as indicated by its license.

Good For

  • Resource-Constrained Deployments: Ideal for scenarios where a smaller model footprint is beneficial without a drastic reduction in performance.
  • LLM Compression Research: Provides a practical example and benchmark for studying depth pruning techniques in large language models.
  • Non-Commercial Applications: Suitable for academic research, personal projects, and other non-profit uses where the Vicuna-v1.3 architecture is desired in a more efficient form.