EmbeddedLLM/Mistral-7B-Merge-02-v0

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Dec 20, 2023License:apache-2.0Architecture:Transformer Open Weights Cold

EmbeddedLLM/Mistral-7B-Merge-02-v0 is a 7 billion parameter language model based on the Mistral-7B-v0.1 architecture, created by EmbeddedLLM. This model is an experimental merge of teknium/OpenHermes-2.5-Mistral-7B and Intel/neural-chat-7b-v3-3 using the DARE TIES method. It is designed to explore the effectiveness of different merging techniques compared to SLERP, offering insights into combining specialized models.

Loading preview...

Model Overview

EmbeddedLLM/Mistral-7B-Merge-02-v0 is a 7 billion parameter experimental language model built upon the Mistral-7B-v0.1 base. Its primary purpose is to compare the effectiveness of the DARE TIES merging method against SLERP, specifically by combining two distinct models: teknium/OpenHermes-2.5-Mistral-7B and Intel/neural-chat-7b-v3-3.

Key Characteristics

  • Architecture: Based on Mistral-7B-v0.1.
  • Merging Method: Utilizes the DARE TIES method with specific weight and density parameters (0.5 for each merged model).
  • Experimental Focus: Aims to provide a direct comparison of DARE TIES performance against SLERP in model merging.

Performance Insights

Preliminary results on the Open LLM Leaderboard indicate that the DARE TIES merge (this model) shows slightly lower average scores compared to a SLERP-merged counterpart (Weyaxi/OpenHermes-2.5-neural-chat-v3-3-Slerp). However, the README notes that further tuning of the DARE TIES method might yield improved results. Specific benchmark comparisons include:

  • Average: 70.69 (DARE TIES) vs 71.38 (SLERP)
  • MMLU: 64.1 (DARE TIES) vs 64.26 (SLERP)
  • TruthfulQA: 60.52 (DARE TIES) vs 62.78 (SLERP)

Use Cases

This model is particularly useful for researchers and developers interested in:

  • Model Merging Research: Exploring and comparing different model merging techniques like DARE TIES and SLERP.
  • Performance Analysis: Evaluating how different merging strategies impact benchmark performance across various tasks.