Joseph717171/Llama-3.1-SuperNova-8B-Lite_TIES_with_Base

Warm
Public
8B
FP8
32768
License: llama3.1
Hugging Face
Overview

Model Overview

Joseph717171/Llama-3.1-SuperNova-8B-Lite_TIES_with_Base is an 8 billion parameter language model derived from the Llama-3.1 family. It was created by Joseph717171 using the mergekit tool, specifically employing the TIES (Trimming and Expanding the Subnetwork) merge method.

Key Differentiator: Enhanced Instruction Following

This model's primary innovation lies in its merge strategy. It combines arcee-ai/Llama-3.1-SuperNova-Lite with its base model, meta-llama/Llama-3.1-8B. The developer, Joseph717171, refined the TIES merge by incorporating the density parameter alongside weight, a technique inspired by successful merges like RomboDawg's Replete-AI. This approach was crucial for restoring and improving the instruction-following capabilities that can sometimes be diminished in merged models.

Merge Details

The TIES merge was performed with a weight of 1 and density of 1 for the instruct model relative to the base. Post-merge, the configuration files were replaced with those of the original instruct model to ensure consistent behavior.

Performance Metrics

Evaluations on the Open LLM Leaderboard show the following results:

  • Average Score: 43.07
  • IFEval (0-Shot): 80.96
  • BBH (3-Shot): 51.10
  • MATH Lvl 5 (4-Shot): 15.56
  • GPQA (0-shot): 30.96
  • MuSR (0-shot): 41.01
  • MMLU-PRO (5-shot): 38.80

These scores indicate a solid performance across various benchmarks, particularly in instruction-following tasks (IFEval).