Gille/StrangeMerges_10-7B-slerp
TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Jan 29, 2024License:apache-2.0Architecture:Transformer Open Weights Cold

Gille/StrangeMerges_10-7B-slerp is a 7 billion parameter language model created by Gille, formed by merging flemmingmiguel/MBX-7B-v3 and Gille/StrangeMerges_9-7B-dare_ties using a slerp method. This model demonstrates strong general reasoning capabilities, achieving an average score of 74.77 on the Open LLM Leaderboard across various benchmarks. With a 4096 token context length, it is suitable for a range of general-purpose text generation and understanding tasks.

Loading preview...

StrangeMerges_10-7B-slerp Overview

StrangeMerges_10-7B-slerp is a 7 billion parameter language model developed by Gille. It is a product of merging two distinct models, flemmingmiguel/MBX-7B-v3 and Gille/StrangeMerges_9-7B-dare_ties, utilizing a slerp (spherical linear interpolation) merge method via LazyMergekit. This merging technique allows for a nuanced combination of the source models' strengths.

Key Capabilities & Performance

The model exhibits robust performance across several benchmarks on the Open LLM Leaderboard, achieving an average score of 74.77. Specific benchmark results include:

  • AI2 Reasoning Challenge (25-Shot): 72.35
  • HellaSwag (10-Shot): 88.30
  • MMLU (5-Shot): 64.87
  • TruthfulQA (0-shot): 69.49
  • Winogrande (5-shot): 83.50
  • GSM8k (5-shot): 70.13

These scores indicate strong general reasoning, common sense, and language understanding abilities, making it a versatile choice for various applications. The model operates with a context length of 4096 tokens.

Configuration Details

The merge configuration involved specific layer ranges for each source model and distinct t values for self-attention and MLP layers, indicating a fine-tuned approach to combining their characteristics. The model uses bfloat16 for its dtype.