huihui-ai/Llama-3.1-8B-Fusion-7030

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Sep 18, 2024License:llama3.1Architecture:Transformer0.0K Cold

huihui-ai/Llama-3.1-8B-Fusion-7030 is an 8 billion parameter Llama 3.1-based mixed model, created by huihui-ai, that blends arcee-ai/Llama-3.1-SuperNova-Lite (70%) and mlabonne/Meta-Llama-3.1-8B-Instruct-abliterated (30%). This experimental fusion aims to combine the strengths of both base models, demonstrating usability without generating gibberish. It shows strong performance on IF_Eval and GPQA benchmarks, making it suitable for tasks requiring robust instruction following and general knowledge.

Loading preview...

Model Overview

huihui-ai/Llama-3.1-8B-Fusion-7030 is an experimental 8 billion parameter language model based on the Llama 3.1 architecture. Developed by huihui-ai, this model is a blend of two distinct Llama-based models: arcee-ai/Llama-3.1-SuperNova-Lite and mlabonne/Meta-Llama-3.1-8B-Instruct-abliterated. The fusion uses a 7:3 ratio, with 70% of the weights from SuperNova-Lite and 30% from the abliterated Meta-Llama-3.1-8B-Instruct model.

Key Characteristics

  • Mixed Architecture: Combines the strengths of a Llama-3.1-8B-Instruct-based model from Arcee.ai and an uncensored Llama 3.1 8B Instruct variant.
  • Experimental Fusion: Part of a series of experiments by huihui-ai to evaluate the impact of different mixing ratios on model performance and coherence.
  • Usability: Despite being a simple weight blend, the model maintains usability and does not produce incoherent or "gibberish" outputs.

Performance Highlights

Evaluations indicate that Llama-3.1-8B-Fusion-7030 achieves competitive scores on several benchmarks:

  • IF_Eval: Achieves 83.10, outperforming both base models and other fusion ratios.
  • GPQA: Scores 32.61, also surpassing its base components and other fusion variants.
  • MMLU Pro, TruthfulQA, BBH: While not leading, it maintains strong performance, demonstrating a balanced capability across various reasoning and knowledge tasks.

This model is particularly well-suited for applications requiring a blend of robust instruction following and general knowledge, benefiting from the combined characteristics of its parent models.