allknowingroger/Qwenslerp2-7B

TEXT GENERATIONConcurrency Cost:1Model Size:7.6BQuant:FP8Ctx Length:32kPublished:Oct 31, 2024License:apache-2.0Architecture:Transformer Open Weights Cold

allknowingroger/Qwenslerp2-7B is a 7.6 billion parameter language model created by allknowingroger using the SLERP merge method. This model combines fblgit/cybertron-v4-qw7B-MGS and Tsunami-th/Tsunami-0.5x-7B-Instruct, leveraging a V-shaped parameter curve for specific layer weighting. With a context length of 32768 tokens, it is designed for general language tasks, showing an average performance of 30.42 on the Open LLM Leaderboard.

Loading preview...

Model Overview

allknowingroger/Qwenslerp2-7B is a 7.6 billion parameter language model developed by allknowingroger. It was created using the SLERP merge method from mergekit, combining two distinct base models: fblgit/cybertron-v4-qw7B-MGS and Tsunami-th/Tsunami-0.5x-7B-Instruct.

Merge Configuration

The merge utilized a specific configuration with a V-shaped parameter curve (t: [0, 0.5, 1, 0.5, 0]). This approach aims to weight different layers of the merged models, specifically using 'Hermes' for input and output layers and 'WizardMath' for middle layers, suggesting an optimization for certain types of processing or reasoning.

Performance Metrics

Evaluated on the Open LLM Leaderboard, Qwenslerp2-7B achieved an average score of 30.42. Key individual benchmark results include:

  • IFEval (0-Shot): 52.94
  • BBH (3-Shot): 37.44
  • MATH Lvl 5 (4-Shot): 31.87
  • MMLU-PRO (5-shot): 39.06

These scores provide insight into its capabilities across instruction following, common sense reasoning, mathematical problem-solving, and general knowledge.

Potential Use Cases

Given its merged architecture and benchmark performance, this model is suitable for:

  • General text generation and understanding tasks.
  • Applications requiring instruction following.
  • Exploration of merged model performance for specific tasks.