saishf/Kuro-Lotus-10.7B
saishf/Kuro-Lotus-10.7B is a 10.7 billion parameter language model created by saishf through a SLERP merge of BlueNipples/SnowLotus-v2-10.7B and Himitsui/KuroMitsu-11B. This model is designed for general language tasks, leveraging the combined strengths of its constituent models. It features a 4096-token context length and demonstrates competitive performance across various benchmarks, including an average score of 71.90 on the Open LLM Leaderboard.
Loading preview...
Kuro-Lotus-10.7B Overview
Kuro-Lotus-10.7B is a 10.7 billion parameter language model developed by saishf. It was created using the SLERP merge method from two distinct pre-trained models: BlueNipples/SnowLotus-v2-10.7B and Himitsui/KuroMitsu-11B. This merging technique aims to combine the strengths of the base models to achieve enhanced performance across a range of language understanding and generation tasks.
Key Capabilities & Performance
This model demonstrates solid performance on standard benchmarks, as evaluated on the Hugging Face Open LLM Leaderboard. Its average score is 71.90, with specific results including:
- AI2 Reasoning Challenge (25-Shot): 68.69
- HellaSwag (10-Shot): 87.51
- MMLU (5-Shot): 66.64
- TruthfulQA (0-shot): 58.27
- Winogrande (5-shot): 84.21
- GSM8k (5-shot): 66.11
These scores indicate its proficiency in reasoning, common sense, factual recall, and mathematical problem-solving. The model's architecture supports a context length of 4096 tokens.
Merge Details
The merge process utilized a specific configuration, applying varying interpolation values across different layers for self-attention and MLP blocks, with a general interpolation value of 0.5. The base model for the merge was Himitsui/KuroMitsu-11B, and the process was configured to use bfloat16 data type.