Gille/StrangeMerges_25-7B-dare_ties

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Feb 18, 2024License:apache-2.0Architecture:Transformer Open Weights Cold

Gille/StrangeMerges_25-7B-dare_ties is a 7 billion parameter language model created by Gille, formed by merging Gille/StrangeMerges_21-7B-slerp and bardsai/jaskier-7b-dpo-v5.6 using the dare_ties method. This model features a 4096-token context length and achieves an average score of 76.33 on the Open LLM Leaderboard, demonstrating capabilities across various reasoning and language understanding tasks. It is suitable for general-purpose text generation and understanding, particularly in scenarios requiring a blend of its constituent models' strengths.

Loading preview...

Overview

StrangeMerges_25-7B-dare_ties is a 7 billion parameter language model developed by Gille. It is a merged model, combining the strengths of two base models: Gille/StrangeMerges_21-7B-slerp and bardsai/jaskier-7b-dpo-v5.6. The merge was performed using the dare_ties method via LazyMergekit, with specific density and weight parameters applied to each contributing model.

Key Capabilities

  • General Language Understanding: Achieves an average score of 76.33 on the Open LLM Leaderboard, indicating strong performance across a range of benchmarks.
  • Reasoning: Scores 73.46 on the AI2 Reasoning Challenge (25-Shot) and 70.43 on GSM8k (5-shot).
  • Common Sense & Factual Knowledge: Demonstrates capabilities with 88.89 on HellaSwag (10-Shot) and 76.54 on TruthfulQA (0-shot).
  • Instruction Following: While the model inherited some "INSTINSTINSTINSTINST" artifacts, it is designed for general text generation tasks.

Performance Benchmarks

Detailed evaluation results from the Open LLM Leaderboard are available here. Notable scores include:

  • Avg.: 76.33
  • MMLU (5-Shot): 64.37
  • Winogrande (5-shot): 84.29

Good For

This model is suitable for developers looking for a 7B parameter model with a balanced performance profile across various language tasks, resulting from the strategic merging of its components. Its 4096-token context length supports moderate input sizes for diverse applications.