beberik/Nyxene-v3-11B
beberik/Nyxene-v3-11B is a 10.7 billion parameter language model developed by beberik, built using a complex merge of four base models including Intel/neural-chat-7b-v3-3-Slerp and AIDC-ai-business/Marcoroni-7B-v3. This model leverages a multi-stage mergekit process, combining different layer ranges and applying slerp interpolation with specific parameter weighting. It is designed for general language tasks, demonstrating competitive performance on the Open LLM Leaderboard with an average score of 70.72 across various benchmarks.
Loading preview...
beberik/Nyxene-v3-11B: A Merged 10.7B Parameter Model
Nyxene-v3-11B is a 10.7 billion parameter language model created by beberik, utilizing a sophisticated merging technique with mergekit. This model is an evolution of Nyxene-v1-11B, incorporating new elements and refined merging strategies.
Key Architectural Details
The model's architecture is a hierarchical merge of four distinct base models:
- go-bruins-loyal-piano-11B: A
passthroughmerge combining specific layer ranges fromrwitz/go-bruins-v2(layers 0-24) andchargoddard/loyal-piano-m7-cdpo(layers 8-32). - neural-marcoroni-11B: Another
passthroughmerge, integrating layer ranges fromAIDC-ai-business/Marcoroni-7B-v3(layers 0-24) andIntel/neural-chat-7b-v3-3-Slerp(layers 8-32).
These two intermediate merges are then combined using a slerp (spherical linear interpolation) method to form Nyxene-11B. This final merge applies specific weighting parameters (t values) to different tensor types (e.g., lm_head, embed_tokens, self_attn, mlp, layernorm) to fine-tune the model's characteristics. The model uses the ChatML prompt template.
Performance Highlights
Evaluated on the Open LLM Leaderboard, Nyxene-v3-11B achieves an average score of 70.72. Notable benchmark results include:
- AI2 Reasoning Challenge (25-Shot): 69.62
- HellaSwag (10-Shot): 85.33
- MMLU (5-Shot): 64.75
- TruthfulQA (0-shot): 60.91
- Winogrande (5-shot): 80.19
- GSM8k (5-shot): 63.53
Use Cases
This model is suitable for general-purpose language generation and understanding tasks, particularly where a balance of reasoning, common sense, and factual recall is beneficial, as indicated by its diverse benchmark performance.