InnerI/InnerILLM-7B-slerp

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Feb 12, 2024License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

InnerILLM-7B-slerp is a 7 billion parameter language model created by InnerI, formed by spherically linear interpolating (slerp) a merge of OpenPipe/mistral-ft-optimized-1218 and mlabonne/NeuralHermes-2.5-Mistral-7B. This model achieves an average score of 71.09 on the Open LLM Leaderboard, demonstrating strong performance across various reasoning and language understanding tasks. It is suitable for general-purpose applications requiring a capable 7B model with a 4096 token context length.

Loading preview...

InnerILLM-7B-slerp Overview

InnerILLM-7B-slerp is a 7 billion parameter language model developed by InnerI, created through a spherical linear interpolation (slerp) merge of two base models: OpenPipe/mistral-ft-optimized-1218 and mlabonne/NeuralHermes-2.5-Mistral-7B. This merging technique aims to combine the strengths of its constituent models.

Key Capabilities & Performance

The model demonstrates competitive performance on the Open LLM Leaderboard, achieving an average score of 71.09. Specific benchmark results include:

  • AI2 Reasoning Challenge (25-Shot): 67.58
  • HellaSwag (10-Shot): 86.19
  • MMLU (5-Shot): 64.15
  • TruthfulQA (0-shot): 59.84
  • Winogrande (5-shot): 80.11
  • GSM8k (5-shot): 68.69

These scores indicate proficiency in common sense reasoning, language understanding, multi-task language understanding, truthfulness, and mathematical problem-solving. The model was evaluated with an average loss of 0.8070214592665433 using a custom testing script.

Merge Configuration

The slerp merge method was applied to layers 0-32 of both source models. The t parameter for the slerp merge was configured differently for self_attn and mlp components, with a general value of 0.5 for other parameters, suggesting a balanced blend of the parent models' characteristics.

Good for

  • General-purpose text generation and understanding tasks.
  • Applications requiring a capable 7B parameter model with a 4096 token context window.
  • Developers looking for a merged model with balanced performance across various benchmarks.