Gille/StrangeMerges_13-7B-slerp

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Jan 31, 2024License:apache-2.0Architecture:Transformer Open Weights Cold

Gille/StrangeMerges_13-7B-slerp is a 7 billion parameter language model created by Gille through a slerp merge of StrangeMerges_12-7B-slerp and speechless-zephyr-code-functionary-7b. This model is designed to combine the strengths of its constituent models, offering a balanced performance across various benchmarks. It is suitable for general-purpose language tasks, with a context length of 4096 tokens.

Loading preview...

Model Overview

Gille/StrangeMerges_13-7B-slerp is a 7 billion parameter language model developed by Gille. It is a product of a spherical linear interpolation (slerp) merge, combining two distinct base models: Gille/StrangeMerges_12-7B-slerp and uukuguy/speechless-zephyr-code-functionary-7b. This merging technique aims to leverage the capabilities of both source models, resulting in a versatile language model.

Key Capabilities

  • Merged Architecture: Utilizes a slerp merge method, specifically applying different interpolation values (t) to self-attention and MLP layers, suggesting an attempt to fine-tune the contribution of each base model's components.
  • General-Purpose Performance: Achieves an average score of 66.06 on the Open LLM Leaderboard, indicating solid performance across a range of tasks.
  • Reasoning and Common Sense: Demonstrates capabilities in reasoning (AI2 Reasoning Challenge: 63.82) and common sense understanding (HellaSwag: 84.95, Winogrande: 79.87).
  • Knowledge and Problem Solving: Scores 64.90 on MMLU (Massive Multitask Language Understanding) and 54.21 on GSM8k (mathematical word problems), showcasing its ability to handle complex academic and arithmetic tasks.

Good for

  • Balanced Applications: Ideal for use cases requiring a general-purpose LLM with a balanced performance profile across various benchmarks.
  • Research into Merged Models: Provides an example of a slerp-merged model, useful for developers interested in exploring model merging techniques and their impact on performance.
  • Instruction Following: Given its base models, it is likely to perform well in instruction-following scenarios, though specific instruction-tuning details are not provided in the merge configuration.