harshitv804/MetaMath-Mistral-2x7B

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Mar 9, 2024License:apache-2.0Architecture:Transformer0.0K Open Weights Warm

harshitv804/MetaMath-Mistral-2x7B is an experimental 7-billion parameter Mixture of Experts (MoE) model based on the Mistral architecture, created by harshitv804. This model was developed using the SLERP merge method, combining two instances of the meta-math/MetaMath-Mistral-7B model. It is designed for exploring MoE concepts and is particularly suited for mathematical reasoning tasks, leveraging its MetaMath-Mistral-7B foundation.

Loading preview...

Model Overview

harshitv804/MetaMath-Mistral-2x7B is an experimental 7-billion parameter Mixture of Experts (MoE) model, developed by harshitv804. It is built upon the Mistral architecture and specifically utilizes the meta-math/MetaMath-Mistral-7B as its base model. The primary purpose of this model is for experimental and learning exploration of MoE architectures.

Merge Details

This model was created using the mergekit tool, employing the SLERP (Spherical Linear Interpolation) merge method. Two instances of the meta-math/MetaMath-Mistral-7B model were merged to form this MoE configuration. The merge process involved specific parameter weighting for self-attention and MLP layers, as detailed in the provided YAML configuration.

Key Capabilities

  • Mathematical Reasoning: Inherits strong mathematical problem-solving capabilities from its MetaMath-Mistral-7B base.
  • Mixture of Experts Architecture: Provides a practical example and platform for understanding and experimenting with MoE models.

Intended Use

This model is suitable for researchers and developers interested in:

  • Exploring the behavior and performance of Mixture of Experts models.
  • Benchmarking mathematical reasoning tasks with an MoE-based approach.
  • Learning about model merging techniques, specifically SLERP, for creating custom LLMs.