arcee-ai/Llama-3.1-SuperNova-Lite

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Sep 10, 2024License:llama3Architecture:Transformer0.2K Warm

Llama-3.1-SuperNova-Lite is an 8 billion parameter model developed by Arcee.ai, based on the Llama-3.1-8B-Instruct architecture with a 32768-token context length. This model is a distilled version of the larger Llama-3.1-405B-Instruct, leveraging offline logits for high performance in a compact form. It excels in instruction-following capabilities and domain-specific adaptability, making it suitable for organizations needing efficient, high-performance LLMs.

Loading preview...

Overview

Arcee.ai's Llama-3.1-SuperNova-Lite is an 8 billion parameter language model built upon the Llama-3.1-8B-Instruct architecture. It features a substantial 32768-token context window, enabling it to process and generate longer, more complex sequences of text.

Key Capabilities & Distillation

This model is a distilled variant of the much larger Llama-3.1-405B-Instruct, utilizing an advanced distillation pipeline that extracts offline logits from the 405B parameter model. This process allows Llama-3.1-SuperNova-Lite to maintain high performance and exceptional instruction-following capabilities while being significantly more compact and efficient. The instruction dataset used for training was generated with EvolKit, ensuring accuracy and efficiency across diverse tasks.

Performance & Use Cases

Llama-3.1-SuperNova-Lite demonstrates strong performance in both benchmark evaluations and practical applications. Its compact size combined with the power derived from its larger counterpart makes it an ideal choice for organizations seeking high-performance language models with reduced resource requirements. It is particularly well-suited for scenarios demanding robust instruction-following and adaptability to specific domains.

Benchmark Highlights

Evaluations on the Open LLM Leaderboard show competitive results:

  • IFEval (0-Shot): 80.17
  • BBH (3-Shot): 31.57
  • MMLU-PRO (5-shot): 31.97

For more details on its training methodology, refer to blog.arcee.ai.

Popular Sampler Settings

Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.

temperature
top_p
top_k
frequency_penalty
presence_penalty
repetition_penalty
min_p