akjindal53244/Llama-3.1-Storm-8B

Warm
Public
8B
FP8
32768
Aug 12, 2024
License: llama3.1
Hugging Face
Overview

Llama-3.1-Storm-8B: Enhanced 8B Generalist Model

Llama-3.1-Storm-8B is an 8 billion parameter model developed by Ashvini Kumar Jindal, Pawan Kumar Rajpoot, Ankur Parikh, and Akshita Sukhlecha. It is built on Meta AI's Llama-3.1-8B-Instruct and features a 32768 token context length. The model demonstrates significant performance improvements over its base model and Hermes-3-Llama-3.1-8B across a range of benchmarks.

Key Enhancements & Capabilities

This model's superior performance is attributed to a three-step process:

  • Self-Curation: Approximately 1 million high-quality examples were selected from 2.8 million open-source examples, focusing on educational value and difficulty, annotated using a Small Language Model (SLM).
  • Targeted Fine-tuning: Utilizes the Spectrum method, which accelerates training by selectively targeting 50% of layer modules based on their signal-to-noise ratio (SNR) and freezing the rest.
  • Model Merging: The fine-tuned model was merged with Llama-Spark using the SLERP method, blending characteristics from both parent models.

Performance Highlights

Llama-3.1-Storm-8B shows notable absolute gains over Meta-Llama-3.1-8B-Instruct:

  • Improved Instruction Following: IFEval Strict (+3.93%)
  • Enhanced Knowledge-Driven QA: GPQA (+7.21%), MMLU-Pro (+0.55%), AGIEval (+3.77%)
  • Better Reasoning: ARC-C (+3.92%), MuSR (+2.77%), BBH (+1.67%), AGIEval (+3.77%)
  • Superior Agentic Capabilities: BFCL Overall Acc (+7.92%), BFCL AST Summary (+12.32%)
  • Reduced Hallucinations: TruthfulQA (+9%)

Use Cases

This model is a powerful generalist, particularly useful for:

  • Applications requiring strong instruction following and reasoning.
  • Knowledge-driven question answering systems.
  • Function calling and agentic tasks, with impressive capabilities demonstrated on the BFCL benchmark.
  • Developers working with limited computational resources who need high performance from an 8B parameter model.