allknowingroger/Gemmaslerp2-9B

TEXT GENERATIONConcurrency Cost:1Model Size:9BQuant:FP8Ctx Length:16kPublished:Oct 29, 2024License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

The allknowingroger/Gemmaslerp2-9B is a 9 billion parameter language model created by allknowingroger, formed by merging sam-paech/Delirium-v1 and DreadPoor/Emu_Eggs-9B-Model_Stock using the SLERP method. This model features a 16384-token context length and is designed to leverage the strengths of its constituent models, showing an average performance of 33.50 on the Open LLM Leaderboard. It is suitable for general language tasks, particularly those benefiting from merged model architectures.

Loading preview...

Model Overview

allknowingroger/Gemmaslerp2-9B is a 9 billion parameter language model developed by allknowingroger. This model was created through a merge of two pre-trained language models: sam-paech/Delirium-v1 and DreadPoor/Emu_Eggs-9B-Model_Stock. The merging process utilized the SLERP (Spherical Linear Interpolation) method, a technique often employed to combine the capabilities of different models.

Key Capabilities & Performance

This merged model aims to integrate the strengths of its base components. Its performance on the Open LLM Leaderboard indicates an average score of 33.50. Specific evaluation metrics include:

  • IFEval (0-Shot): 72.81
  • BBH (3-Shot): 42.54
  • MATH Lvl 5 (4-Shot): 16.54
  • GPQA (0-shot): 13.65
  • MuSR (0-shot): 19.49
  • MMLU-PRO (5-shot): 35.99

Merge Configuration

The SLERP merge was configured with a specific weighting strategy, using a V-shaped curve for the t parameter, which suggests a focus on sam-paech/Delirium-v1 for input and output layers, with DreadPoor/Emu_Eggs-9B-Model_Stock influencing the middle layers. The model operates with a bfloat16 data type.

Use Cases

Given its merged nature and general performance, this model is suitable for a range of natural language processing tasks where a balance of capabilities from its constituent models is desired. Its 16384-token context length supports processing longer inputs.