Gille/StrangeMerges_10-7B-slerp is a 7 billion parameter language model created by Gille, formed by merging flemmingmiguel/MBX-7B-v3 and Gille/StrangeMerges_9-7B-dare_ties using a slerp method. This model demonstrates strong general reasoning capabilities, achieving an average score of 74.77 on the Open LLM Leaderboard across various benchmarks. With a 4096 token context length, it is suitable for a range of general-purpose text generation and understanding tasks.
Loading preview...
StrangeMerges_10-7B-slerp Overview
StrangeMerges_10-7B-slerp is a 7 billion parameter language model developed by Gille. It is a product of merging two distinct models, flemmingmiguel/MBX-7B-v3 and Gille/StrangeMerges_9-7B-dare_ties, utilizing a slerp (spherical linear interpolation) merge method via LazyMergekit. This merging technique allows for a nuanced combination of the source models' strengths.
Key Capabilities & Performance
The model exhibits robust performance across several benchmarks on the Open LLM Leaderboard, achieving an average score of 74.77. Specific benchmark results include:
- AI2 Reasoning Challenge (25-Shot): 72.35
- HellaSwag (10-Shot): 88.30
- MMLU (5-Shot): 64.87
- TruthfulQA (0-shot): 69.49
- Winogrande (5-shot): 83.50
- GSM8k (5-shot): 70.13
These scores indicate strong general reasoning, common sense, and language understanding abilities, making it a versatile choice for various applications. The model operates with a context length of 4096 tokens.
Configuration Details
The merge configuration involved specific layer ranges for each source model and distinct t values for self-attention and MLP layers, indicating a fine-tuned approach to combining their characteristics. The model uses bfloat16 for its dtype.