Azazelle/Mocha-Dare-7b-ex

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Mar 23, 2024License:cc-by-4.0Architecture:Transformer Open Weights Cold

Azazelle/Mocha-Dare-7b-ex is a 7 billion parameter language model based on the Mistral-7B-v0.1 architecture, created by Azazelle through a DARE TIES merge. This model integrates capabilities from Open-Orca/Mistral-7B-OpenOrca, akjindal53244/Mistral-7B-v0.1-Open-Platypus, and WizardLM/WizardMath-7B-V1.1. It is designed to combine general instruction following with enhanced mathematical reasoning and conversational abilities.

Loading preview...

Overview

Mocha-Dare-7b-ex is a 7 billion parameter language model developed by Azazelle, built upon the mistralai/Mistral-7B-v0.1 base model. It was created using the DARE TIES merge method, which combines the strengths of multiple pre-trained models. This merging technique allows for the integration of diverse capabilities into a single, cohesive model.

Key Capabilities

  • Instruction Following: Inherits general instruction-following abilities from Open-Orca/Mistral-7B-OpenOrca.
  • Conversational & Reasoning: Benefits from the akjindal53244/Mistral-7B-v0.1-Open-Platypus component, enhancing its conversational and reasoning skills.
  • Mathematical Reasoning: Incorporates WizardLM/WizardMath-7B-V1.1 to improve its performance on mathematical tasks and problem-solving.

Merge Details

The model was constructed using mergekit with the DARE TIES method. The specific configuration involved density and weight gradients for each merged model, optimizing the contribution of each component. This approach aims to create a versatile model that excels across various domains, particularly in areas requiring both general intelligence and specialized mathematical understanding.