s3nh/Mistral_Sonyichi-7B-slerp

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Jan 5, 2024License:openrailArchitecture:Transformer0.0K Open Weights Cold

s3nh/Mistral_Sonyichi-7B-slerp is a 7 billion parameter language model created by s3nh, merged using the SLERP method from SanjiWatsuki/Sonya-7B, EmbeddedLLM/Mistral-7B-Merge-14-v0.1, and SanjiWatsuki/Kunoichi-7B. This model leverages the Mistral architecture and is optimized for general reasoning tasks, achieving an average score of 70.52 on the Open LLM Leaderboard across various benchmarks including MMLU and HellaSwag. It is suitable for applications requiring robust language understanding and generation within a 4096-token context window.

Loading preview...

Model Overview

s3nh/Mistral_Sonyichi-7B-slerp is a 7 billion parameter language model developed by s3nh, built upon the Mistral architecture. This model is a result of a sophisticated merge operation using the SLERP (Spherical Linear Interpolation) method, combining three distinct base models: SanjiWatsuki/Sonya-7B, EmbeddedLLM/Mistral-7B-Merge-14-v0.1, and SanjiWatsuki/Kunoichi-7B.

Key Capabilities & Performance

The model demonstrates strong performance across a range of benchmarks, as evaluated on the Open LLM Leaderboard. It achieved an average score of 70.52, with notable results including:

  • HellaSwag (10-Shot): 86.43
  • AI2 Reasoning Challenge (25-Shot): 67.49
  • MMLU (5-Shot): 63.58
  • TruthfulQA (0-shot): 63.25
  • Winogrande (5-shot): 78.53
  • GSM8k (5-shot): 63.84

These scores indicate its proficiency in common sense reasoning, language understanding, and mathematical problem-solving. The merging configuration specifically weighted different layers (self_attn and mlp) from the contributing models to achieve this balanced performance.

Use Cases

This model is well-suited for general-purpose language generation and understanding tasks where a 7B parameter model with a 4096-token context window is appropriate. Its balanced performance across various benchmarks makes it a versatile choice for applications requiring robust reasoning and factual recall.