Gweizheng/Marcoro14-7B-dare

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Apr 3, 2024License:apache-2.0Architecture:Transformer Open Weights Cold

Gweizheng/Marcoro14-7B-dare is a 7 billion parameter language model created by Gweizheng, built upon the Mistral-7B-v0.1 architecture. This model is a merge of SamirGPT-v1, Slerp-CM-mist-dpo, and Mistral-7B-Merge-14-v0.2 using the dare_ties merging method. It is designed to combine the strengths of its constituent models, offering a versatile base for various natural language processing tasks.

Loading preview...

Marcoro14-7B-dare Overview

Marcoro14-7B-dare is a 7 billion parameter language model developed by Gweizheng, based on the Mistral-7B-v0.1 architecture. This model is a product of a merge operation, combining three distinct models: SamirGPT-v1, Slerp-CM-mist-dpo, and Mistral-7B-Merge-14-v0.2.

Key Characteristics

  • Merge Method: Utilizes the dare_ties merging method, which is designed to integrate the capabilities of multiple models effectively.
  • Constituent Models: Incorporates contributions from:
    • samir-fama/SamirGPT-v1
    • abacusai/Slerp-CM-mist-dpo
    • EmbeddedLLM/Mistral-7B-Merge-14-v0.2
  • Base Architecture: Built upon the robust mistralai/Mistral-7B-v0.1 foundation.
  • Configuration: The merge process involved specific density and weight parameters for each contributing model, with an int8_mask enabled and bfloat16 dtype for efficiency.

Potential Use Cases

Given its merged nature, Marcoro14-7B-dare is suitable for applications requiring a blend of capabilities from its source models. It can serve as a strong general-purpose language model for tasks such as text generation, summarization, and question answering, benefiting from the diverse training of its components.