mychen76/mistral-7b-merged-dare_6x7

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Mar 11, 2024License:apache-2.0Architecture:Transformer Open Weights Cold

The mychen76/mistral-7b-merged-dare_6x7 is a 7 billion parameter language model based on the Mistral-7B-v0.1 architecture, created by mychen76 through a DARE TIES merge of several fine-tuned models. This merge technique combines the strengths of models like SamirGPT-v1, Slerp-CM-mist-dpo, EmbeddedLLM/Mistral-7B-Merge-14-v0.2, and Weyaxi/Einstein-v4-7B. It demonstrates strong performance across various benchmarks, including an average score of 73.46 on the Open LLM Leaderboard, making it suitable for general-purpose text generation and reasoning tasks.

Loading preview...

Overview

The mychen76/mistral-7b-merged-dare_6x7 is a 7 billion parameter language model built upon the Mistral-7B-v0.1 base architecture. It was created by mychen76 using the DARE TIES merge method, combining the capabilities of multiple specialized models. This merging approach integrates contributions from samir-fama/SamirGPT-v1, abacusai/Slerp-CM-mist-dpo, EmbeddedLLM/Mistral-7B-Merge-14-v0.2, and Weyaxi/Einstein-v4-7B to enhance its overall performance.

Key Capabilities

  • General Text Generation: Capable of generating coherent and contextually relevant text for a wide range of prompts.
  • Reasoning: Achieves a score of 69.62 on the AI2 Reasoning Challenge (25-Shot) and 71.34 on GSM8k (5-shot), indicating proficiency in reasoning and mathematical problem-solving.
  • Commonsense Understanding: Demonstrates strong performance in commonsense reasoning with 87.04 on HellaSwag (10-Shot) and 80.58 on Winogrande (5-shot).
  • Knowledge & Factuality: Scores 65.18 on MMLU (5-Shot) and 66.98 on TruthfulQA (0-shot), reflecting its ability to access and generate factual information.

Performance Highlights

Evaluated on the Open LLM Leaderboard, the model achieved an average score of 73.46. Specific benchmark results include:

  • AI2 Reasoning Challenge (25-Shot): 69.62
  • HellaSwag (10-Shot): 87.04
  • MMLU (5-Shot): 65.18
  • TruthfulQA (0-shot): 66.98
  • Winogrande (5-shot): 80.58
  • GSM8k (5-shot): 71.34

Detailed evaluation results are available on the Open LLM Leaderboard and its specific dataset details page.

When to Use This Model

This model is suitable for developers seeking a robust 7B parameter model for general-purpose applications requiring strong reasoning, commonsense understanding, and factual recall. Its balanced performance across various benchmarks makes it a versatile choice for tasks like content generation, question answering, and conversational AI where a Mistral-based architecture is preferred.