CultriX/MergeTrix-7B

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Jan 15, 2024License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

CultriX/MergeTrix-7B is a 7 billion parameter language model created by CultriX, formed by merging udkai/Turdus with abideen/NexoNimbus-7B, fblgit/UNA-TheBeagle-7b-v1, and argilla/distilabeled-Marcoro14-7B-slerp using a DARE TIES merge method. This model, with a 4096 token context length, offers impressive overall performance, despite a noted subtle DPO-contamination from its base model that slightly influences Winogrande scores while boosting other benchmarks. It is suitable for general language generation tasks where a balanced performance across various metrics is desired.

Loading preview...

CultriX/MergeTrix-7B: A Merged Language Model

CultriX/MergeTrix-7B is a 7 billion parameter model developed by CultriX, created through a DARE TIES merge of several base models. The primary base model is udkai/Turdus, combined with abideen/NexoNimbus-7B, fblgit/UNA-TheBeagle-7b-v1, and argilla/distilabeled-Marcoro14-7B-slerp. This merging strategy aims to leverage the strengths of its constituent models to achieve robust overall performance.

Key Characteristics

  • Merge Method: Utilizes the DARE TIES merge method, specifically configured with int8_mask and bfloat16 dtype for efficiency.
  • Base Model Influence: Built upon udkai/Turdus, which is noted for subtle DPO-contamination. This contamination may slightly affect Winogrande evaluation scores but has been observed to increase accuracy on other benchmarks like MMLU and GSM8K by approximately 0.2%.
  • Performance: Despite the noted contamination, the model is described as having "overall performance quite impressive" by its creator, suggesting a strong general capability.
  • Accessibility: GGUF versions are available for local inference, indicating ease of deployment for various hardware setups.

Good for

  • General Language Generation: Suitable for a wide array of text generation tasks where a balanced performance is beneficial.
  • Exploration of Merged Models: Offers a practical example of a DARE TIES merge, providing insights into combining different LLMs.
  • Community Contribution: Developed by an amateur for community use, encouraging feedback and further development.