Gille/StrangeMerges_20-7B-slerp
Gille/StrangeMerges_20-7B-slerp is a 7 billion parameter language model created by Gille, formed by merging flemmingmiguel/MBX-7B-v3 and Gille/StrangeMerges_11-7B-slerp using the slerp method. This model achieves an average score of 75.52 on the Open LLM Leaderboard, demonstrating strong general reasoning and language understanding capabilities across various benchmarks. With a 4096-token context length, it is suitable for a wide range of general-purpose natural language processing tasks.
Loading preview...
Overview
Gille/StrangeMerges_20-7B-slerp is a 7 billion parameter language model developed by Gille. It is a product of a sophisticated merge operation, combining two distinct models: flemmingmiguel/MBX-7B-v3 and Gille/StrangeMerges_11-7B-slerp. This merge was performed using the slerp (spherical linear interpolation) method, a technique often employed to blend the characteristics of different models effectively.
Key Capabilities & Performance
This model demonstrates robust performance across several benchmarks, as evaluated on the Open LLM Leaderboard. It achieves an average score of 75.52, indicating strong general language understanding and reasoning abilities. Specific benchmark results include:
- AI2 Reasoning Challenge (25-Shot): 73.12
- HellaSwag (10-Shot): 88.45
- MMLU (5-Shot): 65.06
- TruthfulQA (0-shot): 70.90
- Winogrande (5-shot): 83.43
- GSM8k (5-shot): 72.18
These scores highlight its proficiency in diverse tasks ranging from common sense reasoning and factual recall to mathematical problem-solving.
Configuration Details
The merge process involved specific layer ranges from both source models and applied varying t values for self-attention and MLP layers, with an overall t value of 0.45. The model is configured to use bfloat16 dtype for efficiency.
Good For
- General-purpose natural language processing tasks.
- Applications requiring strong reasoning and common sense understanding.
- Scenarios where a balanced performance across multiple benchmarks is desired.