occultml/Helios-10.7B

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:8kPublished:Dec 31, 2023License:apache-2.0Architecture:Transformer Open Weights Cold

occultml/Helios-10.7B is an 8 billion parameter language model, created by occultml, built by merging jeonsworld/CarbonVillain-en-10.7B-v4 and kekmodel/StopCarbon-10.7B-v5 using MergeKit. This model features an 8192 token context length and demonstrates a balanced performance across various reasoning and common sense benchmarks, achieving an average score of 42.19 on the Open LLM Leaderboard. It is suitable for general-purpose language tasks requiring robust understanding and generation capabilities.

Loading preview...

Helios-10.7B: A Merged Language Model

Helios-10.7B is an 8 billion parameter language model developed by occultml, constructed through a merge of two distinct models: jeonsworld/CarbonVillain-en-10.7B-v4 and kekmodel/StopCarbon-10.7B-v5. This merge was performed using MergeKit, specifically employing the slerp method with a bfloat16 data type.

Key Capabilities & Performance

This model demonstrates solid performance across a range of benchmarks, as evaluated on the Open LLM Leaderboard. It achieves an average score of 42.19, indicating its general proficiency in language understanding and reasoning tasks. Notable benchmark results include:

  • HellaSwag (10-Shot): 46.60
  • MMLU (5-Shot): 41.40
  • TruthfulQA (0-shot): 55.52
  • Winogrande (5-shot): 70.72
  • AI2 Reasoning Challenge (25-Shot): 38.91

With an 8192 token context length, Helios-10.7B is well-suited for tasks requiring processing moderately long inputs.

Good for

  • General-purpose text generation and understanding: Its balanced performance across various benchmarks makes it a versatile choice.
  • Applications requiring common sense reasoning: Demonstrated by its scores on HellaSwag and Winogrande.
  • Tasks benefiting from a merged architecture: Leveraging the strengths of its constituent models for improved overall capability.