beberik/Nyxene-v3-11B: A Merged 10.7B Parameter Model
Nyxene-v3-11B is a 10.7 billion parameter language model created by beberik, utilizing a sophisticated merging technique with mergekit. This model is an evolution of Nyxene-v1-11B, incorporating new elements and refined merging strategies.
Key Architectural Details
The model's architecture is a hierarchical merge of four distinct base models:
- go-bruins-loyal-piano-11B: A
passthrough merge combining specific layer ranges from rwitz/go-bruins-v2 (layers 0-24) and chargoddard/loyal-piano-m7-cdpo (layers 8-32). - neural-marcoroni-11B: Another
passthrough merge, integrating layer ranges from AIDC-ai-business/Marcoroni-7B-v3 (layers 0-24) and Intel/neural-chat-7b-v3-3-Slerp (layers 8-32).
These two intermediate merges are then combined using a slerp (spherical linear interpolation) method to form Nyxene-11B. This final merge applies specific weighting parameters (t values) to different tensor types (e.g., lm_head, embed_tokens, self_attn, mlp, layernorm) to fine-tune the model's characteristics. The model uses the ChatML prompt template.
Performance Highlights
Evaluated on the Open LLM Leaderboard, Nyxene-v3-11B achieves an average score of 70.72. Notable benchmark results include:
- AI2 Reasoning Challenge (25-Shot): 69.62
- HellaSwag (10-Shot): 85.33
- MMLU (5-Shot): 64.75
- TruthfulQA (0-shot): 60.91
- Winogrande (5-shot): 80.19
- GSM8k (5-shot): 63.53
Use Cases
This model is suitable for general-purpose language generation and understanding tasks, particularly where a balance of reasoning, common sense, and factual recall is beneficial, as indicated by its diverse benchmark performance.