beberik/Nyxene-v2-11B
beberik/Nyxene-v2-11B is a 10.7 billion parameter language model created by beberik, built using a complex merge of several 7B models including Starling-LM-7B-alpha, DPOpenHermes-7B, una-cybertron-7b-v2, and loyal-piano-m7-cdpo. This model leverages a slerp merge method with specific layer and parameter weighting to combine the strengths of its base models. It is designed for general-purpose conversational AI and instruction following, demonstrating competitive performance across various benchmarks.
Loading preview...
Nyxene-v2-11B: A Merged Language Model
Nyxene-v2-11B is a 10.7 billion parameter model developed by beberik, representing an advanced iteration of the Nyxene series. This model is constructed through a sophisticated merging process using mergekit, combining four distinct 7B base models to achieve enhanced capabilities.
Key Capabilities & Architecture
The model's unique architecture is derived from a multi-stage merging strategy:
- Initial Merges: It first combines
fblgit/una-cybertron-7b-v2withchargoddard/loyal-piano-m7-cdpoto form "loyal-piano-cybertron-11B", andberkeley-nest/Starling-LM-7B-alphawithopenaccess-ai-collective/DPOpenHermes-7Bto form "Starling-DPOHermes-11B". - Final Merge: These two intermediate 11B models are then merged using a slerp method, with specific
tparameter weightings applied to different tensor types (e.g.,lm_head,embed_tokens,self_attn,mlp,layernorm) to fine-tune the final model's characteristics. - Prompt Template: The recommended prompt template for optimal performance is a standard instruction-following format, utilizing
<|system|>,<|user|>, and<|assistant|>tags.
Performance & Benchmarks
Evaluated on the Open LLM Leaderboard, Nyxene-v2-11B achieves an average score of 67.84. Notable scores include:
- AI2 Reasoning Challenge (25-Shot): 67.41
- HellaSwag (10-Shot): 84.54
- MMLU (5-Shot): 65.26
- TruthfulQA (0-shot): 55.62
- Winogrande (5-shot): 79.56
- GSM8k (5-shot): 54.66
These results indicate its proficiency across various reasoning, common sense, and language understanding tasks, making it suitable for general-purpose applications requiring robust instruction following and knowledge recall.