MaziyarPanahi/TheTop-5x7B-Instruct-S5-v0.1
TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Feb 12, 2024License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

MaziyarPanahi/TheTop-5x7B-Instruct-S5-v0.1 is a 7 billion parameter instruction-tuned language model created by MaziyarPanahi, built by merging top 7B models and applying SLERP to other 7B models using mergekit. This model is designed for general instruction-following tasks, demonstrating strong performance across various benchmarks including an average score of 75.14 on the Open LLM Leaderboard. It offers a balanced capability for reasoning, common sense, and factual recall within its 4096-token context window.

Loading preview...

Overview

MaziyarPanahi/TheTop-5x7B-Instruct-S5-v0.1 is a 7 billion parameter instruction-tuned language model developed by MaziyarPanahi. This model was constructed using mergekit, a toolkit for merging pre-trained language models, by combining several top-performing 7B models and applying Spherical Linear Interpolation (SLERP) to others. This merging strategy aims to leverage the strengths of multiple base models to achieve enhanced overall performance.

Key Capabilities

  • General Instruction Following: Designed to respond effectively to a wide range of user instructions.
  • Reasoning and Common Sense: Achieves 72.53 on AI2 Reasoning Challenge (25-Shot) and 88.71 on HellaSwag (10-Shot).
  • Factual Knowledge: Scores 67.58 on TruthfulQA (0-shot) and 65.01 on MMLU (5-Shot).
  • Problem Solving: Demonstrates capability in mathematical reasoning with 70.81 on GSM8k (5-shot).

Performance Highlights

The model's performance is evaluated on the Open LLM Leaderboard, achieving an average score of 75.14. Notable individual benchmark results include:

  • AI2 Reasoning Challenge (25-Shot): 72.53
  • HellaSwag (10-Shot): 88.71
  • MMLU (5-Shot): 65.01
  • TruthfulQA (0-shot): 67.58
  • Winogrande (5-shot): 86.19
  • GSM8k (5-shot): 70.81

When to Use This Model

This model is suitable for applications requiring a capable 7B instruction-following model with a balanced performance across various general-purpose tasks. Its merged architecture suggests a robust foundation for diverse conversational and analytical use cases, particularly where a 4096-token context window is sufficient.