mlabonne/Darewin-7B

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Jan 23, 2024License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

Darewin-7B is a 7 billion parameter language model created by mlabonne, formed by merging six distinct Mistral-7B based models using the LazyMergekit and dare_ties method. This merge aims to combine the strengths of its constituent models, including Intel/neural-chat-7b-v3-3 and openchat/openchat-3.5-0106, to achieve a balanced performance across various reasoning and language understanding tasks. It demonstrates an average score of 71.87 on the Open LLM Leaderboard, making it suitable for general-purpose applications requiring robust language capabilities.

Loading preview...

Darewin-7B: A Merged 7B Language Model

Darewin-7B is a 7 billion parameter model developed by mlabonne, constructed through a sophisticated merge of six different Mistral-7B based models. This model leverages the dare_ties merge method via LazyMergekit to integrate diverse capabilities from its components, such as Intel/neural-chat-7b-v3-3, openaccess-ai-collective/DPOpenHermes-7B-v2, and openchat/openchat-3.5-0106.

Key Capabilities & Performance

Darewin-7B exhibits strong performance across a range of benchmarks, achieving an average score of 71.87 on the Open LLM Leaderboard Evaluation Results. Notable scores include:

  • AI2 Reasoning Challenge (25-Shot): 68.60
  • HellaSwag (10-Shot): 86.22
  • MMLU (5-Shot): 65.21
  • GSM8k (5-Shot): 71.04

This model is configured with bfloat16 dtype and includes int8_mask for optimized performance.

Ideal Use Cases

  • General-purpose language generation: Its balanced performance makes it suitable for a wide array of text-based tasks.
  • Reasoning and question answering: Demonstrated capabilities in ARC and MMLU suggest proficiency in complex reasoning.
  • Applications requiring robust language understanding: Effective for tasks like summarization, content creation, and conversational AI.