nota-ai/st-vicuna-v1.3-5.5b-taylor

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Mar 22, 2024Architecture:Transformer0.0K Cold

The nota-ai/st-vicuna-v1.3-5.5b-taylor model, developed by Nota AI, is a 5.5 billion parameter depth-pruned version of the Vicuna-v1.3-7B large language model. It utilizes a one-shot pruning method based on identifying unimportant Transformer blocks and light LoRA-based retraining, specifically using the Taylor+ pruning criterion. This model is designed for efficient text generation by reducing the original 7B parameters by 20% while aiming to maintain performance. It is intended for research and non-commercial projects requiring a more compact LLM.

Loading preview...

Model Overview

The nota-ai/st-vicuna-v1.3-5.5b-taylor is a 5.5 billion parameter language model developed by Nota AI. It is a depth-pruned version of the Vicuna-v1.3-7B model, specifically optimized for efficient text generation. This model achieves a 20% reduction in parameters from its 7B base model by identifying and pruning unimportant Transformer blocks.

Key Characteristics

  • Depth Pruning: Employs a one-shot pruning technique combined with light LoRA-based retraining to reduce model size.
  • Taylor+ Criterion: Utilizes the Taylor+ criterion for pruning, which helps in identifying and removing less critical layers.
  • Efficiency Focused: Designed to offer a more compact alternative to larger LLMs, making it suitable for environments with resource constraints.
  • Non-Commercial License: Intended strictly for research and non-commercial projects.

Use Cases

This model is particularly well-suited for:

  • Research on Model Compression: Ideal for researchers exploring methods of making large language models more efficient.
  • Resource-Constrained Deployments: Suitable for applications where a smaller model footprint is critical, provided the performance trade-offs are acceptable.
  • Non-Commercial Applications: Can be used in academic or personal projects that do not involve commercial use.

For more technical details, refer to the associated paper: Shortened LLaMA: A Simple Depth Pruning for Large Language Models.