Gille/StrangeMerges_23-7B-slerp

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Feb 13, 2024License:apache-2.0Architecture:Transformer Open Weights Cold

StrangeMerges_23-7B-slerp is a 7 billion parameter language model created by Gille, formed by merging paulml/OGNO-7B and Gille/StrangeMerges_21-7B-slerp using the slerp method. This model leverages a 4096-token context length and achieves an average score of 76.17 on the Open LLM Leaderboard, demonstrating strong performance across various reasoning and language understanding tasks. It is suitable for general-purpose text generation and conversational AI applications.

Loading preview...

StrangeMerges_23-7B-slerp: A Merged 7B Language Model

StrangeMerges_23-7B-slerp is a 7 billion parameter model developed by Gille, created through a strategic merge of two base models: paulml/OGNO-7B and Gille/StrangeMerges_21-7B-slerp. This merge was performed using the slerp (spherical linear interpolation) method via LazyMergekit, allowing for a balanced combination of the source models' capabilities.

Key Capabilities & Performance

This model demonstrates robust performance across a range of benchmarks, as evaluated on the Open LLM Leaderboard. It achieves an average score of 76.17, with notable results including:

  • AI2 Reasoning Challenge (25-Shot): 73.55
  • HellaSwag (10-Shot): 88.90
  • MMLU (5-Shot): 64.87
  • TruthfulQA (0-shot): 75.13
  • Winogrande (5-shot): 84.29
  • GSM8k (5-shot): 70.28

These scores indicate strong general reasoning, common sense, and language understanding abilities. The model is configured with a 4096-token context length, making it suitable for tasks requiring moderate context processing.

Ideal Use Cases

  • General Text Generation: Capable of producing coherent and contextually relevant text for various prompts.
  • Conversational AI: Its performance on reasoning and truthfulness benchmarks suggests suitability for interactive applications.
  • Research and Experimentation: Provides a solid base for further fine-tuning or exploring merged model architectures.