nota-ai/st-llama-1-5.5b-taylor

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Mar 22, 2024Architecture:Transformer0.0K Cold

The nota-ai/st-llama-1-5.5b-taylor model, developed by Nota AI, is a 5.5 billion parameter depth-pruned version of the LLaMA-1-7B model, optimized for efficient text generation. This model utilizes a Taylor+ criterion for pruning, reducing its size while aiming to maintain performance. It is designed for research and non-commercial projects requiring a more compact LLaMA-based language model.

Loading preview...

Model Overview

The nota-ai/st-llama-1-5.5b-taylor model is a 5.5 billion parameter language model developed by Nota AI. It is a depth-pruned variant of the original LLaMA-1-7B model, created through a process that identifies and removes less important Transformer blocks. This specific version uses the Taylor+ criterion during its one-shot pruning and light LoRA-based retraining to achieve a 20% reduction in parameters from its 7B base model.

Key Characteristics

  • Efficient Text Generation: Designed for more efficient text generation by reducing model size through depth pruning.
  • Pruning Method: Employs a novel depth-pruning technique combined with LoRA-based retraining, specifically using the Taylor+ criterion for block removal.
  • Base Model: Derived from the LLaMA-1-7B architecture.
  • Parameter Count: Reduced to 5.5 billion parameters from the original 7 billion, offering a more compact footprint.

Intended Use Cases

This model is primarily intended for:

  • Research Projects: Exploring the effects of structured pruning on large language models.
  • Non-Commercial Applications: Developing and experimenting with LLMs where efficiency and a smaller model size are beneficial.
  • Comparative Studies: Benchmarking the performance of depth-pruned models against their larger counterparts.