Model Overview
harshitv804/MetaMath-Mistral-2x7B is an experimental 7-billion parameter Mixture of Experts (MoE) model, developed by harshitv804. It is built upon the Mistral architecture and specifically utilizes the meta-math/MetaMath-Mistral-7B as its base model. The primary purpose of this model is for experimental and learning exploration of MoE architectures.
Merge Details
This model was created using the mergekit tool, employing the SLERP (Spherical Linear Interpolation) merge method. Two instances of the meta-math/MetaMath-Mistral-7B model were merged to form this MoE configuration. The merge process involved specific parameter weighting for self-attention and MLP layers, as detailed in the provided YAML configuration.
Key Capabilities
- Mathematical Reasoning: Inherits strong mathematical problem-solving capabilities from its MetaMath-Mistral-7B base.
- Mixture of Experts Architecture: Provides a practical example and platform for understanding and experimenting with MoE models.
Intended Use
This model is suitable for researchers and developers interested in:
- Exploring the behavior and performance of Mixture of Experts models.
- Benchmarking mathematical reasoning tasks with an MoE-based approach.
- Learning about model merging techniques, specifically SLERP, for creating custom LLMs.