stevez80/ErebusNeuralSamir-7B-dare-ties

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Mar 9, 2024License:apache-2.0Architecture:Transformer Open Weights Cold

stevez80/ErebusNeuralSamir-7B-dare-ties is a 7 billion parameter language model created by stevez80, built upon the Mistral-7B-v0.1 architecture. This model is a DARE TIES merge of SamirGPT-v1, NeuralHermes-2.5-Mistral-7B, and Mistral-7B-Erebus-v3, designed to combine the strengths of its constituent models. It features a 4096-token context length and is configured with int8_mask and bfloat16 dtype for efficient operation.

Loading preview...

ErebusNeuralSamir-7B-dare-ties Overview

ErebusNeuralSamir-7B-dare-ties is a 7 billion parameter language model developed by stevez80. It is constructed using the DARE TIES merging method, combining three distinct models based on the Mistral-7B-v0.1 architecture: samir-fama/SamirGPT-v1, mlabonne/NeuralHermes-2.5-Mistral-7B, and KoboldAI/Mistral-7B-Erebus-v3. This merging approach aims to leverage the unique characteristics and capabilities of each base model.

Key Configuration Details

  • Base Model: mistralai/Mistral-7B-v0.1
  • Merge Method: DARE TIES, with specific density and weight parameters applied to each merged component.
  • Merged Models:
    • samir-fama/SamirGPT-v1 (density: 0.53, weight: 0.3)
    • mlabonne/NeuralHermes-2.5-Mistral-7B (density: 0.53, weight: 0.3)
    • KoboldAI/Mistral-7B-Erebus-v3 (density: 0.53, weight: 0.4)
  • Parameters: Includes int8_mask for potential quantization benefits and uses bfloat16 for numerical precision and efficiency.

Potential Use Cases

Given its merged nature, this model is likely suitable for applications requiring a blend of the strengths from its constituent models. Developers looking for a 7B parameter model with a 4096-token context length, built from a combination of well-regarded Mistral-7B fine-tunes, may find this model useful for various general-purpose language generation tasks.