Overview

Arcee.ai's Llama-3.1-SuperNova-Lite is an 8 billion parameter language model built upon the Llama-3.1-8B-Instruct architecture. It features a substantial 32768-token context window, enabling it to process and generate longer, more complex sequences of text.

Key Capabilities & Distillation

This model is a distilled variant of the much larger Llama-3.1-405B-Instruct, utilizing an advanced distillation pipeline that extracts offline logits from the 405B parameter model. This process allows Llama-3.1-SuperNova-Lite to maintain high performance and exceptional instruction-following capabilities while being significantly more compact and efficient. The instruction dataset used for training was generated with EvolKit, ensuring accuracy and efficiency across diverse tasks.

Performance & Use Cases

Llama-3.1-SuperNova-Lite demonstrates strong performance in both benchmark evaluations and practical applications. Its compact size combined with the power derived from its larger counterpart makes it an ideal choice for organizations seeking high-performance language models with reduced resource requirements. It is particularly well-suited for scenarios demanding robust instruction-following and adaptability to specific domains.

Benchmark Highlights

Evaluations on the Open LLM Leaderboard show competitive results:

IFEval (0-Shot): 80.17
BBH (3-Shot): 31.57
MMLU-PRO (5-shot): 31.97

For more details on its training methodology, refer to blog.arcee.ai.