occultml/Helios-10.7B-v2

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:8kPublished:Dec 31, 2023License:apache-2.0Architecture:Transformer Open Weights Cold

Helios-10.7B-v2 by occultml is an 8 billion parameter language model, merged from jeonsworld/CarbonVillain-en-10.7B-v2 and kekmodel/StopCarbon-10.7B-v5 using a slerp merge method. This model features an 8192 token context length and is designed for general language tasks, demonstrating a 42.25 average score on the Open LLM Leaderboard. Its merged architecture aims to combine the strengths of its constituent models for balanced performance across various benchmarks.

Loading preview...

Helios-10.7B-v2: Merged Language Model

Helios-10.7B-v2 is an 8 billion parameter language model developed by occultml, created through a strategic merge of two distinct models: jeonsworld/CarbonVillain-en-10.7B-v2 and kekmodel/StopCarbon-10.7B-v5. This merge was performed using mergekit with a slerp (spherical linear interpolation) method, aiming to combine the capabilities of its base models.

Key Characteristics

  • Architecture: A merged model combining CarbonVillain-en-10.7B-v2 and StopCarbon-10.7B-v5.
  • Parameter Count: Approximately 8 billion parameters.
  • Context Length: Supports an 8192 token context window.
  • Merge Method: Utilizes slerp for parameter interpolation, with specific t values applied to self-attention and MLP layers.

Performance Overview

Evaluated on the Open LLM Leaderboard, Helios-10.7B-v2 achieved an average score of 42.25. Specific benchmark results include:

  • AI2 Reasoning Challenge (25-Shot): 39.16
  • HellaSwag (10-Shot): 46.63
  • MMLU (5-Shot): 41.57
  • TruthfulQA (0-shot): 55.51
  • Winogrande (5-shot): 70.64

Notably, the model scored 0.00 on GSM8k (5-shot), indicating it is not optimized for complex mathematical reasoning tasks. Its balanced performance across other benchmarks suggests suitability for general language understanding and generation tasks where a broad range of capabilities is desired.