saishf/Kuro-Lotus-10.7B

TEXT GENERATIONConcurrency Cost:1Model Size:10.7BQuant:FP8Ctx Length:4kPublished:Feb 3, 2024License:cc-by-nc-4.0Architecture:Transformer0.0K Open Weights Cold

saishf/Kuro-Lotus-10.7B is a 10.7 billion parameter language model created by saishf through a SLERP merge of BlueNipples/SnowLotus-v2-10.7B and Himitsui/KuroMitsu-11B. This model is designed for general language tasks, leveraging the combined strengths of its constituent models. It features a 4096-token context length and demonstrates competitive performance across various benchmarks, including an average score of 71.90 on the Open LLM Leaderboard.

Loading preview...

Kuro-Lotus-10.7B Overview

Kuro-Lotus-10.7B is a 10.7 billion parameter language model developed by saishf. It was created using the SLERP merge method from two distinct pre-trained models: BlueNipples/SnowLotus-v2-10.7B and Himitsui/KuroMitsu-11B. This merging technique aims to combine the strengths of the base models to achieve enhanced performance across a range of language understanding and generation tasks.

Key Capabilities & Performance

This model demonstrates solid performance on standard benchmarks, as evaluated on the Hugging Face Open LLM Leaderboard. Its average score is 71.90, with specific results including:

  • AI2 Reasoning Challenge (25-Shot): 68.69
  • HellaSwag (10-Shot): 87.51
  • MMLU (5-Shot): 66.64
  • TruthfulQA (0-shot): 58.27
  • Winogrande (5-shot): 84.21
  • GSM8k (5-shot): 66.11

These scores indicate its proficiency in reasoning, common sense, factual recall, and mathematical problem-solving. The model's architecture supports a context length of 4096 tokens.

Merge Details

The merge process utilized a specific configuration, applying varying interpolation values across different layers for self-attention and MLP blocks, with a general interpolation value of 0.5. The base model for the merge was Himitsui/KuroMitsu-11B, and the process was configured to use bfloat16 data type.