Joseph717171/Llama-3.1-SuperNova-8B-Lite_TIES_with_Base

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Oct 2, 2024License:llama3.1Architecture:Transformer0.0K Warm

Joseph717171/Llama-3.1-SuperNova-8B-Lite_TIES_with_Base is an 8 billion parameter language model based on the Llama-3.1 architecture, created by Joseph717171. This model is a merge of arcee-ai/Llama-3.1-SuperNova-Lite with its Llama-3.1-8B base, utilizing the TIES merge method. It is specifically engineered to restore and enhance instruction-following capabilities, making it suitable for tasks requiring precise adherence to prompts.

Loading preview...

Model Overview

Joseph717171/Llama-3.1-SuperNova-8B-Lite_TIES_with_Base is an 8 billion parameter language model derived from the Llama-3.1 family. It was created by Joseph717171 using the mergekit tool, specifically employing the TIES (Trimming and Expanding the Subnetwork) merge method.

Key Differentiator: Enhanced Instruction Following

This model's primary innovation lies in its merge strategy. It combines arcee-ai/Llama-3.1-SuperNova-Lite with its base model, meta-llama/Llama-3.1-8B. The developer, Joseph717171, refined the TIES merge by incorporating the density parameter alongside weight, a technique inspired by successful merges like RomboDawg's Replete-AI. This approach was crucial for restoring and improving the instruction-following capabilities that can sometimes be diminished in merged models.

Merge Details

The TIES merge was performed with a weight of 1 and density of 1 for the instruct model relative to the base. Post-merge, the configuration files were replaced with those of the original instruct model to ensure consistent behavior.

Performance Metrics

Evaluations on the Open LLM Leaderboard show the following results:

  • Average Score: 43.07
  • IFEval (0-Shot): 80.96
  • BBH (3-Shot): 51.10
  • MATH Lvl 5 (4-Shot): 15.56
  • GPQA (0-shot): 30.96
  • MuSR (0-shot): 41.01
  • MMLU-PRO (5-shot): 38.80

These scores indicate a solid performance across various benchmarks, particularly in instruction-following tasks (IFEval).