beberik/Nyxene-v2-11B

TEXT GENERATIONConcurrency Cost:1Model Size:10.7BQuant:FP8Ctx Length:4kPublished:Dec 4, 2023License:cc-by-nc-4.0Architecture:Transformer0.0K Open Weights Cold

beberik/Nyxene-v2-11B is a 10.7 billion parameter language model created by beberik, built using a complex merge of several 7B models including Starling-LM-7B-alpha, DPOpenHermes-7B, una-cybertron-7b-v2, and loyal-piano-m7-cdpo. This model leverages a slerp merge method with specific layer and parameter weighting to combine the strengths of its base models. It is designed for general-purpose conversational AI and instruction following, demonstrating competitive performance across various benchmarks.

Loading preview...

Nyxene-v2-11B: A Merged Language Model

Nyxene-v2-11B is a 10.7 billion parameter model developed by beberik, representing an advanced iteration of the Nyxene series. This model is constructed through a sophisticated merging process using mergekit, combining four distinct 7B base models to achieve enhanced capabilities.

Key Capabilities & Architecture

The model's unique architecture is derived from a multi-stage merging strategy:

  • Initial Merges: It first combines fblgit/una-cybertron-7b-v2 with chargoddard/loyal-piano-m7-cdpo to form "loyal-piano-cybertron-11B", and berkeley-nest/Starling-LM-7B-alpha with openaccess-ai-collective/DPOpenHermes-7B to form "Starling-DPOHermes-11B".
  • Final Merge: These two intermediate 11B models are then merged using a slerp method, with specific t parameter weightings applied to different tensor types (e.g., lm_head, embed_tokens, self_attn, mlp, layernorm) to fine-tune the final model's characteristics.
  • Prompt Template: The recommended prompt template for optimal performance is a standard instruction-following format, utilizing <|system|>, <|user|>, and <|assistant|> tags.

Performance & Benchmarks

Evaluated on the Open LLM Leaderboard, Nyxene-v2-11B achieves an average score of 67.84. Notable scores include:

  • AI2 Reasoning Challenge (25-Shot): 67.41
  • HellaSwag (10-Shot): 84.54
  • MMLU (5-Shot): 65.26
  • TruthfulQA (0-shot): 55.62
  • Winogrande (5-shot): 79.56
  • GSM8k (5-shot): 54.66

These results indicate its proficiency across various reasoning, common sense, and language understanding tasks, making it suitable for general-purpose applications requiring robust instruction following and knowledge recall.