monology/mixtral-soup

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Mar 20, 2024License:apache-2.0Architecture:Transformer Open Weights Cold

monology/mixtral-soup is an experimental 7B parameter language model created by monology, formed by a linear merge of eight distinct Mixtral-expert models. This model leverages the Mixture-of-Experts architecture, combining specialized components to potentially enhance performance across various tasks. It is designed for experimental purposes, exploring the capabilities of merged Mixtral experts within a 4096-token context window.

Loading preview...

Model Overview

monology/mixtral-soup is an experimental 7B parameter language model developed by monology. It is constructed using the mergekit tool, specifically employing a linear merge method to combine eight different pre-trained Mixtral-expert models. This approach aims to explore the synergistic potential of combining specialized expert models.

Merge Details

The model integrates the following expert models:

  • monology/mixtral-expert0
  • monology/mixtral-expert1
  • monology/mixtral-expert2
  • monology/mixtral-expert3
  • monology/mixtral-expert4
  • monology/mixtral-expert5
  • monology/mixtral-expert6
  • monology/mixtral-expert7

Each expert model was given an equal weight of 1.0 during the linear merge process. The merging was performed with a float16 data type. The primary purpose of this model is for experimental evaluation, and its performance characteristics are still under assessment.

Intended Use

This model is provided for experimental purposes only to investigate the outcomes of merging multiple Mixtral-expert components. Users interested in model merging techniques or exploring the behavior of composite Mixture-of-Experts models may find this resource valuable.