Gille/StrangeMerges_2-7B-slerp
TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Jan 27, 2024License:apache-2.0Architecture:Transformer Open Weights Cold

Gille/StrangeMerges_2-7B-slerp is a 7 billion parameter language model created by Gille, formed by merging Gille/StrangeMerges_1-7B-slerp and Keynote-Technology/KAI-7B-v0.1 using a slerp method. This model demonstrates a strong average performance of 69.34 on the Open LLM Leaderboard, with notable scores in reasoning and common sense benchmarks. It is suitable for general language generation tasks requiring a balance of reasoning and factual recall within its 4096 token context length.

Loading preview...

Model Overview

Gille/StrangeMerges_2-7B-slerp is a 7 billion parameter language model developed by Gille. It is a product of merging two distinct models, Gille/StrangeMerges_1-7B-slerp and Keynote-Technology/KAI-7B-v0.1, utilizing a spherical linear interpolation (slerp) merge method. This approach combines the strengths of its constituent models to achieve balanced performance.

Key Capabilities

  • Merged Architecture: Combines two base models, Gille/StrangeMerges_1-7B-slerp and Keynote-Technology/KAI-7B-v0.1, through a slerp merge, allowing for a blend of their respective characteristics.
  • Reasoning and Common Sense: Achieves competitive scores on various benchmarks, including 66.89 on AI2 Reasoning Challenge and 82.40 on Winogrande, indicating strong reasoning and common sense abilities.
  • General Language Tasks: With an average score of 69.34 on the Open LLM Leaderboard, it is well-suited for a broad range of text generation and understanding applications.

Performance Highlights

The model's performance on the Open LLM Leaderboard includes:

  • Avg.: 69.34
  • AI2 Reasoning Challenge (25-Shot): 66.89
  • HellaSwag (10-Shot): 85.52
  • MMLU (5-Shot): 65.22
  • TruthfulQA (0-shot): 54.53
  • Winogrande (5-shot): 82.40
  • GSM8k (5-shot): 61.49

Usage

This model can be easily integrated into Python projects using the transformers library for text generation tasks, supporting a context length of 4096 tokens.