Gille/StrangeMerges_3-7B-slerp

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Jan 27, 2024License:apache-2.0Architecture:Transformer Open Weights Cold

Gille/StrangeMerges_3-7B-slerp is a 7 billion parameter language model created by Gille, formed by merging FelixChao/WestSeverus-7B-DPO-v2 and Gille/StrangeMerges_1-7B-slerp using a slerp method. This model demonstrates strong general reasoning capabilities, achieving an average score of 74.57 on the Open LLM Leaderboard. It is suitable for a variety of general-purpose language generation tasks, particularly those benefiting from its balanced performance across multiple benchmarks.

Loading preview...

Overview

Gille/StrangeMerges_3-7B-slerp is a 7 billion parameter language model developed by Gille. It is a product of merging two distinct models: FelixChao/WestSeverus-7B-DPO-v2 and Gille/StrangeMerges_1-7B-slerp. This merge was performed using the slerp (spherical linear interpolation) method, a technique often employed to combine the strengths of different models.

Key Capabilities

This model exhibits robust performance across a range of benchmarks, as evaluated on the Open LLM Leaderboard. Its key capabilities include:

  • General Reasoning: Achieved 70.82 on the AI2 Reasoning Challenge (25-Shot).
  • Common Sense Reasoning: Scored 87.79 on HellaSwag (10-Shot) and 82.56 on Winogrande (5-shot).
  • Knowledge & Understanding: Demonstrated 65.12 on MMLU (5-Shot) and 68.86 on TruthfulQA (0-shot).
  • Mathematical Reasoning: Performed well with 72.25 on GSM8k (5-shot).

Overall, the model boasts an average score of 74.57 on the Open LLM Leaderboard, indicating a balanced and strong general-purpose capability.

Good For

StrangeMerges_3-7B-slerp is well-suited for applications requiring a versatile 7B parameter model with solid performance across various reasoning and language understanding tasks. Its balanced benchmark results suggest it can be effectively used for:

  • General text generation and completion.
  • Question answering and information extraction.
  • Tasks benefiting from strong common sense and mathematical reasoning.
  • As a base for further fine-tuning for specific applications where a robust generalist model is desired.