Gille/StrangeMerges_4-7B-slerp

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Jan 27, 2024License:apache-2.0Architecture:Transformer Open Weights Cold

StrangeMerges_4-7B-slerp is a 7 billion parameter language model developed by Gille, created by merging Gille/StrangeMerges_3-7B-slerp and Gille/StrangeMerges_2-7B-slerp using a slerp method. This model demonstrates an average performance of 72.63 on the Open LLM Leaderboard, with notable scores in HellaSwag (87.01) and Winogrande (82.95). It is designed for general language generation tasks, leveraging its merged architecture for balanced capabilities.

Loading preview...

Model Overview

Gille/StrangeMerges_4-7B-slerp is a 7 billion parameter language model resulting from a strategic merge of two previous models: Gille/StrangeMerges_3-7B-slerp and Gille/StrangeMerges_2-7B-slerp. This merging process utilized the slerp (spherical linear interpolation) method, specifically configured with varying interpolation values across different layers, including self-attention and MLP blocks, to achieve a balanced integration of their respective strengths.

Key Capabilities & Performance

This model has been evaluated on the Open LLM Leaderboard, achieving a competitive average score of 72.63. Its performance highlights include:

  • HellaSwag (10-Shot): 87.01
  • Winogrande (5-Shot): 82.95
  • AI2 Reasoning Challenge (25-Shot): 69.45
  • GSM8k (5-Shot): 68.61
  • MMLU (5-Shot): 65.33
  • TruthfulQA (0-shot): 62.40

These scores indicate a solid capability across various reasoning, common sense, and language understanding tasks. The model supports a context length of 4096 tokens and is designed for general text generation applications.

Usage

Developers can easily integrate StrangeMerges_4-7B-slerp into their projects using the Hugging Face transformers library. The model is compatible with bfloat16 data types and can be run on GPU-accelerated environments with device_map="auto" for efficient resource utilization.