akjindal53244/Llama-3.1-Storm-8B

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Aug 12, 2024License:llama3.1Architecture:Transformer0.2K Warm

Llama-3.1-Storm-8B is an 8 billion parameter language model developed by Ashvini Kumar Jindal, Pawan Kumar Rajpoot, Ankur Parikh, and Akshita Sukhlecha, built upon Meta AI's Llama-3.1-8B-Instruct with a 32768 token context length. This model significantly outperforms its base and Hermes-3-Llama-3.1-8B across diverse benchmarks, excelling in instruction following, knowledge-driven QA, reasoning, and function calling. It achieves these enhancements through self-curation, Spectrum-based targeted fine-tuning, and SLERP model merging, making it a powerful generalist model for various applications.

Loading preview...

Llama-3.1-Storm-8B: Enhanced 8B Generalist Model

Llama-3.1-Storm-8B is an 8 billion parameter model developed by Ashvini Kumar Jindal, Pawan Kumar Rajpoot, Ankur Parikh, and Akshita Sukhlecha. It is built on Meta AI's Llama-3.1-8B-Instruct and features a 32768 token context length. The model demonstrates significant performance improvements over its base model and Hermes-3-Llama-3.1-8B across a range of benchmarks.

Key Enhancements & Capabilities

This model's superior performance is attributed to a three-step process:

  • Self-Curation: Approximately 1 million high-quality examples were selected from 2.8 million open-source examples, focusing on educational value and difficulty, annotated using a Small Language Model (SLM).
  • Targeted Fine-tuning: Utilizes the Spectrum method, which accelerates training by selectively targeting 50% of layer modules based on their signal-to-noise ratio (SNR) and freezing the rest.
  • Model Merging: The fine-tuned model was merged with Llama-Spark using the SLERP method, blending characteristics from both parent models.

Performance Highlights

Llama-3.1-Storm-8B shows notable absolute gains over Meta-Llama-3.1-8B-Instruct:

  • Improved Instruction Following: IFEval Strict (+3.93%)
  • Enhanced Knowledge-Driven QA: GPQA (+7.21%), MMLU-Pro (+0.55%), AGIEval (+3.77%)
  • Better Reasoning: ARC-C (+3.92%), MuSR (+2.77%), BBH (+1.67%), AGIEval (+3.77%)
  • Superior Agentic Capabilities: BFCL Overall Acc (+7.92%), BFCL AST Summary (+12.32%)
  • Reduced Hallucinations: TruthfulQA (+9%)

Use Cases

This model is a powerful generalist, particularly useful for:

  • Applications requiring strong instruction following and reasoning.
  • Knowledge-driven question answering systems.
  • Function calling and agentic tasks, with impressive capabilities demonstrated on the BFCL benchmark.
  • Developers working with limited computational resources who need high performance from an 8B parameter model.

Popular Sampler Settings

Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.

temperature
top_p
top_k
–
frequency_penalty
presence_penalty
repetition_penalty
–
min_p
–