Azazelle/Moko-DARE

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Mar 22, 2024License:cc-by-4.0Architecture:Transformer Open Weights Cold

Azazelle/Moko-DARE is a 7 billion parameter language model merge, built upon the Mistral-7B-v0.1 base using the DARE TIES method. This model integrates capabilities from Open-Orca/Mistral-7B-OpenOrca, akjindal53244/Mistral-7B-v0.1-Open-Platypus, and WizardLM/WizardMath-7B-V1.1. It is designed to combine general instruction following with enhanced mathematical reasoning and instruction-tuned performance, offering a balanced solution for diverse NLP tasks within a 4096 token context window.

Loading preview...

Moko-DARE: A DARE TIES Merged Language Model

Moko-DARE is a 7 billion parameter language model developed by Azazelle, created through a sophisticated merge of several pre-trained models using the DARE TIES method. This approach leverages the strengths of multiple specialized models to produce a more versatile and capable single model.

Key Capabilities & Merged Components

This model is built on the robust mistralai/Mistral-7B-v0.1 base and integrates the following models:

  • Open-Orca/Mistral-7B-OpenOrca: Contributes strong general instruction-following abilities.
  • akjindal53244/Mistral-7B-v0.1-Open-Platypus: Enhances instruction-tuned performance.
  • WizardLM/WizardMath-7B-V1.1: Provides specialized capabilities in mathematical reasoning and problem-solving.

Merge Methodology

The merge was executed using mergekit with the DARE TIES method, which allows for a nuanced combination of model weights. The configuration involved specific density and weight gradients for each component model, ensuring a balanced integration of their respective strengths. This merge strategy aims to create a model that excels across a broader range of tasks, particularly benefiting from the mathematical prowess of WizardMath while maintaining strong general language understanding and instruction adherence.