beberik/Nyxene-v3-11B

TEXT GENERATIONConcurrency Cost:1Model Size:10.7BQuant:FP8Ctx Length:4kPublished:Dec 12, 2023License:cc-by-nc-4.0Architecture:Transformer0.0K Open Weights Cold

beberik/Nyxene-v3-11B is a 10.7 billion parameter language model developed by beberik, built using a complex merge of four base models including Intel/neural-chat-7b-v3-3-Slerp and AIDC-ai-business/Marcoroni-7B-v3. This model leverages a multi-stage mergekit process, combining different layer ranges and applying slerp interpolation with specific parameter weighting. It is designed for general language tasks, demonstrating competitive performance on the Open LLM Leaderboard with an average score of 70.72 across various benchmarks.

Loading preview...

beberik/Nyxene-v3-11B: A Merged 10.7B Parameter Model

Nyxene-v3-11B is a 10.7 billion parameter language model created by beberik, utilizing a sophisticated merging technique with mergekit. This model is an evolution of Nyxene-v1-11B, incorporating new elements and refined merging strategies.

Key Architectural Details

The model's architecture is a hierarchical merge of four distinct base models:

  • go-bruins-loyal-piano-11B: A passthrough merge combining specific layer ranges from rwitz/go-bruins-v2 (layers 0-24) and chargoddard/loyal-piano-m7-cdpo (layers 8-32).
  • neural-marcoroni-11B: Another passthrough merge, integrating layer ranges from AIDC-ai-business/Marcoroni-7B-v3 (layers 0-24) and Intel/neural-chat-7b-v3-3-Slerp (layers 8-32).

These two intermediate merges are then combined using a slerp (spherical linear interpolation) method to form Nyxene-11B. This final merge applies specific weighting parameters (t values) to different tensor types (e.g., lm_head, embed_tokens, self_attn, mlp, layernorm) to fine-tune the model's characteristics. The model uses the ChatML prompt template.

Performance Highlights

Evaluated on the Open LLM Leaderboard, Nyxene-v3-11B achieves an average score of 70.72. Notable benchmark results include:

  • AI2 Reasoning Challenge (25-Shot): 69.62
  • HellaSwag (10-Shot): 85.33
  • MMLU (5-Shot): 64.75
  • TruthfulQA (0-shot): 60.91
  • Winogrande (5-shot): 80.19
  • GSM8k (5-shot): 63.53

Use Cases

This model is suitable for general-purpose language generation and understanding tasks, particularly where a balance of reasoning, common sense, and factual recall is beneficial, as indicated by its diverse benchmark performance.