mychen76/mistral-7b-merged-slerp

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Mar 9, 2024License:apache-2.0Architecture:Transformer Open Weights Cold

mychen76/mistral-7b-merged-slerp is a 7 billion parameter language model created by mychen76, formed by merging OpenPipe/mistral-ft-optimized-1218 and mlabonne/NeuralHermes-2.5-Mistral-7B using the slerp method. This model leverages the Mistral architecture and is optimized for general language understanding and generation tasks, achieving an average score of 71.09 on the Open LLM Leaderboard. It is suitable for applications requiring a balanced performance across various reasoning and comprehension benchmarks within a 4096-token context window.

Loading preview...

Model Overview

mychen76/mistral-7b-merged-slerp is a 7 billion parameter language model developed by mychen76. This model is a result of merging two base models, OpenPipe/mistral-ft-optimized-1218 and mlabonne/NeuralHermes-2.5-Mistral-7B, using the slerp (spherical linear interpolation) merge method. The merge configuration specifically adjusts parameters for self-attention and MLP layers, aiming to combine the strengths of its constituent models.

Key Capabilities & Performance

This merged model demonstrates strong performance across a range of benchmarks, as evaluated on the Open LLM Leaderboard:

  • Average Score: 71.09
  • AI2 Reasoning Challenge (25-Shot): 67.75
  • HellaSwag (10-Shot): 86.17
  • MMLU (5-Shot): 64.05
  • TruthfulQA (0-shot): 59.85
  • Winogrande (5-shot): 80.19
  • GSM8k (5-shot): 68.54

These scores indicate a balanced capability in reasoning, common sense, language understanding, and mathematical problem-solving. The model operates within a 4096-token context window.

When to Use This Model

This model is a good choice for developers looking for a 7B parameter model that offers a solid general-purpose performance. Its slerp-merged architecture suggests an attempt to combine the best features of its base models, making it suitable for applications requiring robust language generation and comprehension, particularly where a balance across various reasoning tasks is important.