Gille/StrangeMerges_12-7B-slerp

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Jan 30, 2024License:apache-2.0Architecture:Transformer Open Weights Cold

Gille/StrangeMerges_12-7B-slerp is a 7 billion parameter language model created by Gille, built by merging Keynote-Technology/KAI-7B-v0.1 and Gille/StrangeMerges_11-7B-slerp using a slerp merge method. This model achieves an average score of 69.13 on the Open LLM Leaderboard, demonstrating strong performance across various reasoning and language understanding tasks. With a 4096-token context length, it is suitable for general-purpose text generation and conversational AI applications.

Loading preview...

StrangeMerges_12-7B-slerp Overview

StrangeMerges_12-7B-slerp is a 7 billion parameter language model developed by Gille. It was created through a sophisticated merging process using LazyMergekit, combining two base models: Keynote-Technology/KAI-7B-v0.1 and Gille/StrangeMerges_11-7B-slerp. The merge utilized a slerp (spherical linear interpolation) method, with specific t parameters applied to self-attention and MLP layers to fine-tune the model's characteristics.

Key Capabilities & Performance

This model demonstrates solid performance across a range of benchmarks, as evaluated on the Open LLM Leaderboard. It achieved an average score of 69.13, with notable results including:

  • AI2 Reasoning Challenge (25-Shot): 66.64
  • HellaSwag (10-Shot): 85.89
  • MMLU (5-Shot): 64.94
  • TruthfulQA (0-shot): 52.55
  • Winogrande (5-shot): 81.69
  • GSM8k (5-shot): 63.08

These scores indicate its proficiency in common sense reasoning, language understanding, and mathematical problem-solving.

Use Cases

StrangeMerges_12-7B-slerp is well-suited for general text generation tasks, conversational AI, and applications requiring robust reasoning capabilities within a 7 billion parameter footprint. Its 4096-token context length supports processing moderately long inputs.