martyn/mistral-megamerge-dare-7b

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:8kPublished:Dec 14, 2023License:mitArchitecture:Transformer Open Weights Cold

The martyn/mistral-megamerge-dare-7b is a 7 billion parameter language model based on the Mistral architecture. This model is a mega-merge of seven distinct Mistral-7B variants, including Mistral-7B-Instruct-v0.2 and specialized models like speechless-code-mistral-7b-v1.0, created using a specific merging technique. It is designed to combine the strengths of its constituent models, offering a versatile foundation for various natural language processing tasks.

Loading preview...

Model Overview

The martyn/mistral-megamerge-dare-7b is a 7 billion parameter language model that represents a "mega-merge" of seven different Mistral-7B based models. This merge was performed using the safetensors-merge-supermario tool with specific hyperparameters (p=0.12 and lambda=2.1), aiming to consolidate the capabilities of its diverse components.

Key Merged Components

The base model for this merge is mistralai/Mistral-7B-Instruct-v0.2. Additionally, it incorporates specialized models such as:

  • uukuguy/speechless-code-mistral-7b-v1.0
  • AIDC-ai-business/Marcoroni-7B-v3
  • Weyaxi/Seraph-7B
  • rwitz/dec10
  • Intel/neural-chat-7b-v3-3
  • rwitz/go-bruins-v2

Merging Process

The model was created using a custom merging script, which allows for combining multiple base models into a single, more robust model. This approach suggests an attempt to leverage the unique strengths and fine-tuning of each constituent model, potentially leading to improved generalization or specialized performance across different domains.

Potential Use Cases

Given its diverse origins, this merged model could be suitable for a range of applications where a blend of instruction-following, coding capabilities, and general conversational prowess is beneficial. Its 7B parameter size makes it efficient for deployment while still offering strong performance.

Popular Sampler Settings

Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.

temperature
top_p
top_k
frequency_penalty
presence_penalty
repetition_penalty
min_p