DesivoMerge0.1: A Continuously Merged Language Model
DesivoMerge0.1 is a 7 billion parameter language model developed by cris177, utilizing the MergeKit framework. The core idea behind this model is a continuous merging strategy, where new models are iteratively integrated to enhance overall performance.
Key Development Strategy
- Iterative Merging: The model began by merging
open-orca-mistral-7B and open-hermes-7B. The resulting merge was then combined with TurdusBeagle-7B, identified as a top-performing 7B model on the Open LLM Leaderboard. - Performance-Driven Integration: The merging process is designed to continue adding models until the average score of the merged models falls below that of the previous iteration, at which point the strategy will backtrack and explore alternative models.
- Contamination Avoidance: Efforts are made to avoid merging models known to be contaminated, by carefully reviewing candidates before integration.
Technical Configuration
The merging process uses a slerp (spherical linear interpolation) method, with specific t parameters applied to different layers (self_attn, mlp) to fine-tune the merge ratios. The model is configured to use bfloat16 for its dtype.
Potential Use Cases
DesivoMerge0.1 is suitable for a variety of general natural language processing tasks, benefiting from the combined strengths of its constituent models. Its iterative development approach aims to provide a robust and capable base for applications requiring strong text generation and comprehension.