Name: nota-ai/st-vicuna-v1.3-5.5b-ppl API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: nota-ai

Shortened LLaMA: Efficient Depth-Pruned LLMs

The nota-ai/st-vicuna-v1.3-5.5b-ppl model is part of the Shortened LLaMA series developed by Nota AI, focusing on creating more efficient large language models through depth pruning. This specific model is a 5.5 billion parameter variant, derived from the Vicuna-v1.3-7B model by reducing its depth by 20%.

Key Capabilities & Features

Efficient Text Generation: Achieved through a novel depth-pruning method that identifies and removes less important Transformer blocks.
Pruning Method: Utilizes a one-shot pruning approach followed by light LoRA-based retraining, specifically employing a Perplexity (PPL) criterion for this model variant.
Reduced Parameter Count: Offers a more compact model (5.5B parameters) compared to its 7B parameter base model, making it suitable for environments with resource constraints.
Research-Oriented: Intended primarily for research and non-commercial projects, as indicated by its license.

Good For

Resource-Constrained Deployments: Ideal for scenarios where a smaller model footprint is beneficial without a drastic reduction in performance.
LLM Compression Research: Provides a practical example and benchmark for studying depth pruning techniques in large language models.
Non-Commercial Applications: Suitable for academic research, personal projects, and other non-profit uses where the Vicuna-v1.3 architecture is desired in a more efficient form.

Overview

Shortened LLaMA: Efficient Depth-Pruned LLMs

Key Capabilities & Features

Good For

Full Model Card (README)