andrijdavid/Macaroni-v2-7b

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Feb 5, 2024License:apache-2.0Architecture:Transformer Open Weights Cold

Macaroni-v2-7b by andrijdavid is a 7 billion parameter language model created by merging flemmingmiguel/MBX-7B-v3, mlabonne/OmniBeagle-7B, and vanillaOVO/supermario_v4 using the DARE TIES method, with mistralai/Mistral-7B-v0.1 as its base. This model leverages the strengths of its constituent models to offer a versatile foundation for various natural language processing tasks. Its 4096-token context length supports moderate-length interactions and text generation.

Loading preview...

Overview

Macaroni-v2-7b is a 7 billion parameter language model developed by andrijdavid. It was created using the DARE TIES merge method, combining several pre-trained models with mistralai/Mistral-7B-v0.1 serving as the base architecture. This merging technique aims to synthesize the capabilities of its component models into a single, more robust model.

Merge Details

This model is a product of merging three distinct models:

  • flemmingmiguel/MBX-7B-v3
  • mlabonne/OmniBeagle-7B
  • vanillaOVO/supermario_v4
    The DARE TIES (Disentangled Attribution Regularization for Efficient TIES) method was employed for the merge, which is designed to combine models effectively while preserving their individual strengths. The configuration involved specific density and weight parameters for each merged model, along with int8_mask and normalize settings, and a float16 dtype.

Potential Use Cases

Given its merged nature, Macaroni-v2-7b is likely suitable for a range of general-purpose NLP applications where a 7B parameter model with a 4096-token context window is appropriate. Its design suggests a balanced performance across various tasks, benefiting from the diverse training of its constituent models.