djuna/L3.1-Promissum_Mane-8B-Della-1.5-calc

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Oct 29, 2024Architecture:Transformer0.0K Cold

djuna/L3.1-Promissum_Mane-8B-Della-1.5-calc is an 8 billion parameter language model merged using the Della method, based on unsloth/Meta-Llama-3.1-8B. This model integrates three distinct 8B models: DreadPoor/Spei_Meridiem-8B-model_stock, DreadPoor/Aspire1.1-8B-model_stock, and DreadPoor/Heart_Stolen1.1-8B-Model_Stock. It features a 32768 token context length and is designed for general language tasks, with specific evaluation results available for reasoning and mathematical benchmarks.

Loading preview...

Model Overview

djuna/L3.1-Promissum_Mane-8B-Della-1.5-calc is an 8 billion parameter language model created by djuna through a Della merge using mergekit. It is built upon the unsloth/Meta-Llama-3.1-8B base model and combines the strengths of three distinct 8B models from DreadPoor: Spei_Meridiem-8B-model_stock, Aspire1.1-8B-model_stock, and Heart_Stolen1.1-8B-Model_Stock.

Key Capabilities & Performance

This model leverages the Della merge method with specific parameters including density: 1, lambda: 1.05, and epsilon: 0.04, and supports a context length of 32768 tokens. Evaluation on the Open LLM Leaderboard shows an average score of 29.18. Notable performance metrics include:

  • IFEval (0-Shot): 72.35
  • BBH (3-Shot): 34.88
  • MATH Lvl 5 (4-Shot): 13.97
  • MMLU-PRO (5-shot): 32.26

When to Use This Model

This model is suitable for general language understanding and generation tasks, particularly where a blend of capabilities from its constituent models is desired. Its performance on IFEval suggests potential for instruction following, while its MATH Lvl 5 score indicates some mathematical reasoning ability. Developers looking for a merged Llama 3.1-based model with specific characteristics from the included DreadPoor models may find this useful.