allknowingroger/Marco-01-slerp1-7B
TEXT GENERATIONConcurrency Cost:1Model Size:7.6BQuant:FP8Ctx Length:32kPublished:Nov 22, 2024License:apache-2.0Architecture:Transformer Open Weights Cold

allknowingroger/Marco-01-slerp1-7B is a 7.6 billion parameter language model created by allknowingroger, merged using the SLERP method from AIDC-AI/Marco-o1 and allknowingroger/HomerSlerp1-7B. This model is specifically configured with a V-shaped curve for parameter blending, aiming to leverage Hermes for input/output layers and WizardMath for middle layers. It is designed for general language tasks, with its merge strategy suggesting a focus on balanced performance across various domains.

Loading preview...

Model Overview

allknowingroger/Marco-01-slerp1-7B is a 7.6 billion parameter language model resulting from a merge of two pre-trained models: AIDC-AI/Marco-o1 and allknowingroger/HomerSlerp1-7B. This model was created using the SLERP (Spherical Linear Interpolation) merge method, a technique known for smoothly combining the weights of different models.

Merge Configuration

The merge utilized a specific configuration designed to blend the characteristics of the constituent models. The parameters setting in the mergekit YAML indicates a V-shaped curve, suggesting that the model aims to incorporate strengths from "Hermes" for its input and output layers, while integrating "WizardMath" into its middle layers. This strategic blending is intended to optimize performance across various linguistic tasks.

Performance Metrics

Evaluations on the Open LLM Leaderboard show the following average scores:

  • Average Score: 29.49
  • IFEval (0-Shot): 46.81
  • BBH (3-Shot): 36.23
  • MATH Lvl 5 (4-Shot): 31.57
  • GPQA (0-shot): 8.95
  • MuSR (0-shot): 14.65
  • MMLU-PRO (5-shot): 38.70

Detailed evaluation results are available on the Open LLM Leaderboard.

Intended Use Cases

Given its merged architecture and general language model nature, Marco-01-slerp1-7B is suitable for a range of applications where a 7B parameter model with a balanced performance profile is beneficial. Its specific merge strategy, incorporating elements from models like "Hermes" and "WizardMath," suggests potential strengths in areas requiring both general language understanding and potentially mathematical or reasoning capabilities, depending on the specific characteristics of the blended models.