allknowingroger/HomerSlerp4-7B

TEXT GENERATIONConcurrency Cost:1Model Size:7.6BQuant:FP8Ctx Length:32kPublished:Nov 21, 2024License:apache-2.0Architecture:Transformer Open Weights Cold

allknowingroger/HomerSlerp4-7B is a 7.6 billion parameter language model created by allknowingroger using the SLERP merge method, combining allknowingroger/Qwen2.5-7B-task8 and allknowingroger/HomerSlerp2-7B. This model is designed for general language tasks, leveraging its merged architecture to achieve a balanced performance across various benchmarks. It features a 32768 token context length, making it suitable for processing longer inputs.

Loading preview...

Model Overview

allknowingroger/HomerSlerp4-7B is a 7.6 billion parameter language model developed by allknowingroger. It was created using the SLERP merge method from mergekit, combining two base models: allknowingroger/Qwen2.5-7B-task8 and allknowingroger/HomerSlerp2-7B. The merge configuration utilized a V-shaped curve for parameter interpolation, specifically weighting HomerSlerp2-7B for input and output layers and Qwen2.5-7B-task8 for middle layers.

Performance Highlights

Evaluated on the Open LLM Leaderboard, HomerSlerp4-7B demonstrates a balanced performance across various tasks. Key metrics include:

  • Avg. Score: 28.62
  • IFEval (0-Shot): 43.74
  • BBH (3-Shot): 36.79
  • MATH Lvl 5 (4-Shot): 29.53
  • MMLU-PRO (5-shot): 38.58

Detailed evaluation results are available on the Open LLM Leaderboard.

Use Cases

This model is suitable for general-purpose language generation and understanding tasks where a 7.6 billion parameter model with a 32768 token context window is appropriate. Its merged architecture aims to leverage the strengths of its constituent models for diverse applications.