Gille/StrangeMerges_27-7B-dare_ties

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Feb 21, 2024License:apache-2.0Architecture:Transformer Open Weights Cold

Gille/StrangeMerges_27-7B-dare_ties is a 7 billion parameter language model created by Gille, formed by merging eren23/ogno-monarch-jaskier-merge-7b-v2 and Gille/StrangeMerges_21-7B-slerp using the dare_ties method. This model achieves an average score of 76.17 on the Open LLM Leaderboard, demonstrating strong performance across various reasoning and language understanding benchmarks. With a context length of 4096 tokens, it is suitable for general-purpose text generation and conversational AI applications.

Loading preview...

Model Overview

Gille/StrangeMerges_27-7B-dare_ties is a 7 billion parameter language model developed by Gille. It is a product of merging two distinct models: eren23/ogno-monarch-jaskier-merge-7b-v2 and Gille/StrangeMerges_21-7B-slerp, utilizing the dare_ties merge method. This approach combines the strengths of its constituent models to deliver enhanced performance.

Key Capabilities & Performance

This model demonstrates robust performance across a range of benchmarks, as evaluated on the Open LLM Leaderboard. It achieves an average score of 76.17, with notable results in specific areas:

  • AI2 Reasoning Challenge (25-Shot): 73.72
  • HellaSwag (10-Shot): 89.00
  • MMLU (5-Shot): 64.50
  • TruthfulQA (0-shot): 76.36
  • Winogrande (5-shot): 84.61
  • GSM8k (5-shot): 68.84

These scores indicate strong capabilities in reasoning, common sense, language understanding, and mathematical problem-solving. The model supports a context length of 4096 tokens, making it suitable for tasks requiring moderate input and output lengths.

Ideal Use Cases

Given its balanced performance across various benchmarks, Gille/StrangeMerges_27-7B-dare_ties is well-suited for:

  • General-purpose text generation
  • Conversational AI and chatbots
  • Reasoning and question-answering tasks
  • Applications requiring a capable 7B parameter model with good overall understanding.