Overview
Gille/StrangeMerges_16-7B-slerp is a 7 billion parameter language model developed by Gille. It is a product of a sophisticated merge operation, combining the strengths of two base models: Gille/StrangeMerges_15-7B-slerp and SanjiWatsuki/Kunoichi-7B. This merge was performed using the slerp (spherical linear interpolation) method, with specific parameter weightings applied to self-attention and MLP layers to optimize performance. The model supports a context length of 4096 tokens.
Key Capabilities & Performance
This model demonstrates robust performance across various benchmarks, as evaluated on the Open LLM Leaderboard. It achieved an average score of 72.80, indicating strong general language understanding and reasoning. Specific benchmark results include:
- AI2 Reasoning Challenge (25-Shot): 69.03
- HellaSwag (10-Shot): 87.15
- MMLU (5-Shot): 65.65
- TruthfulQA (0-shot): 62.97
- Winogrande (5-shot): 81.29
- GSM8k (5-shot): 70.74
Good For
- General-purpose natural language processing tasks requiring solid reasoning and language comprehension.
- Applications where a 7B parameter model with a 4096-token context window is suitable for balancing performance and computational resources.
- Users looking for a model with a balanced performance profile across diverse benchmarks, including common sense reasoning, reading comprehension, and mathematical problem-solving.