NeverSleep/Mistral-11B-OmniMix-bf16 Overview
This 10.7 billion parameter model, developed by NeverSleep, is an experimental merge of four distinct Mistral-7B base models: Mistral-7B-OpenOrca, Mistral-7B-v0.1-Open-Platypus, CollectiveCognition-v1.1-Mistral-7B, and Zephyr-7b-alpha. The primary goal behind its creation was to investigate and demonstrate the potential of advanced merging and layer manipulation techniques using mergekit to achieve high benchmark performance.
Key Characteristics
- Experimental Merge: Constructed by combining layers and parameters from multiple Mistral-7B models, specifically designed to explore the limits of model merging.
- Benchmark-Oriented: Developed with the explicit aim of scoring highly on various benchmarks, serving as a proof-of-concept for merge strategies.
- BFloat16 Precision: The model is intended to operate in bfloat16 precision.
- Context Length: Supports a context window of 4096 tokens.
Intended Use and Philosophy
This model serves as a testbed to highlight two core ideas:
- Objectivity of Benchmarks: Demonstrates that specific merging techniques can significantly influence benchmark results.
- User Evaluation Encouraged: Promotes the idea that users should test models directly for their specific use cases rather than relying solely on reported scores. While it shows strong benchmark performance, it is noted as needing further fine-tuning for specific applications like roleplay.
Prompt Templates
The recommended prompt template is a system/user/assistant format:
<|system|>
Below is an instruction that describes a task. Write a response that appropriately completes the request.
<|user|>
{prompt}
<|assistant|>
Alternative templates from the source models are also compatible.