Gille/StrangeMerges_21-7B-slerp

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Feb 12, 2024License:apache-2.0Architecture:Transformer Open Weights Cold

Gille/StrangeMerges_21-7B-slerp is a 7 billion parameter language model created by Gille, developed through a slerp merge of StrangeMerges_20-7B-slerp and NeuTrixOmniBe-7B-model-remix. This model is designed for general language tasks, leveraging its merged architecture to achieve an average score of 76.29 on the Open LLM Leaderboard. It demonstrates strong performance across various benchmarks, including 88.95 on HellaSwag and 84.61 on Winogrande, making it suitable for diverse applications requiring robust language understanding and generation.

Loading preview...

Model Overview

Gille/StrangeMerges_21-7B-slerp is a 7 billion parameter language model developed by Gille. It was created using a spherical linear interpolation (slerp) merge method, combining two base models: Gille/StrangeMerges_20-7B-slerp and Kukedlc/NeuTrixOmniBe-7B-model-remix. This merging technique aims to blend the strengths of its constituent models, offering a balanced performance across various tasks.

Key Capabilities & Performance

This model demonstrates solid performance on the Open LLM Leaderboard, achieving an average score of 76.29. Specific benchmark results highlight its capabilities:

  • AI2 Reasoning Challenge (25-Shot): 74.23
  • HellaSwag (10-Shot): 88.95
  • MMLU (5-Shot): 65.05
  • TruthfulQA (0-shot): 73.81
  • Winogrande (5-shot): 84.61
  • GSM8k (5-shot): 71.11

These scores indicate its proficiency in reasoning, common sense, language understanding, and mathematical problem-solving. The model's configuration involved specific t values for self-attention and MLP filters during the slerp merge, optimizing its internal architecture.

When to Use This Model

StrangeMerges_21-7B-slerp is a versatile 7B model suitable for general-purpose language generation and understanding tasks. Its balanced performance across multiple benchmarks suggests it can be effectively used in applications requiring:

  • Text generation: For creative writing, content creation, or conversational AI.
  • Reasoning tasks: Given its scores on ARC and GSM8k.
  • Question answering: Supported by its performance on TruthfulQA and MMLU.

Developers looking for a robust 7B model with a strong foundation from merged architectures may find this model particularly useful.