MarcMistral-7B Overview
MarcMistral-7B is an experimental 7 billion parameter language model developed by flemmingmiguel. It is constructed as a merge of two distinct models: nfaheem/Marcoroni-7b-DPO-Merge and EmbeddedLLM/Mistral-7B-Merge-14-v0.5, utilizing the LazyMergekit framework. This merge strategy is an ongoing experiment to identify the most effective base model combination for subsequent fine-tuning efforts.
Key Characteristics
- Experimental Merge: Designed to test the synergy between models excelling in different benchmarks, specifically combining a model with high MMLU (Massive Multitask Language Understanding) performance with one strong in ARC (AI2 Reasoning Challenge).
- Component Models: Built upon
Marcoroni-7b-DPO-Merge and Mistral-7B-Merge-14-v0.5, aiming to leverage their respective strengths. - Configuration: The merge uses a
slerp method, with specific parameter weighting for self-attention and MLP layers, and a fallback value for other tensors.
Intended Use
This model is primarily intended for researchers and developers looking to:
- Explore Model Merging: Investigate the effects of combining different pre-trained models to achieve specific performance profiles.
- Base for Fine-tuning: Serve as a foundational model for further domain-specific or task-specific fine-tuning, with the goal of identifying a "clear winner" in benchmarks among various experimental merges.