Overview
This model, SanjiWatsuki/neural-chat-7b-v3-3-wizardmath-dare-me, is a 7 billion parameter experimental merge of mistralai/Mistral-7B-v0.1, WizardLM/WizardMath-7B-V1.1, and Intel/neural-chat-7b-v3-3. The primary goal of this project was to experiment with a novel model merging strategy that combines DARE TIE (Drop And RESCALE) with task arithmetic. The hypothesis was to determine if this combined approach could effectively transfer skills between finetuned models with less performance degradation, particularly for 7B models where DARE TIE alone can be less stable.
Experimental Approach
The merging process involved two main steps:
- DARE TIE Merger: Initially, a DARE TIE merge was performed between
Mistral-7B-v0.1 and WizardMath-7B-V1.1 with a density of 0.3, meaning 70% of the delta parameters were dropped. DARE TIE is known for its ability to transfer strengths by merging a minimal part of the model. - Task Arithmetic Merger: The resulting low-density DARE TIE merged model was then combined with
Intel/neural-chat-7b-v3-3 using task arithmetic. This second step aimed to integrate the strengths of neural-chat-7b-v3-3 while leveraging the skill transfer from the DARE TIE step.
Outcome
The experiment concluded that this specific merging strategy did not achieve the desired outcome, leading to a significant degradation in the model's overall performance. This model serves as a documented experiment in advanced merging techniques, highlighting challenges in combining different merger strategies for 7B models.
Licensing
WizardMath is under the Microsoft Research License, while Intel/neural-chat-7b-v3-3 is under Apache 2.0.