Nyxene-v1-11B: A Merged Language Model for Enhanced Creativity
Nyxene-v1-11B is a 10.7 billion parameter language model developed by beberik, representing an iteration on previous merging experiments. This model is constructed by combining several 7B-class base models using mergekit, a technique that allows for the strategic integration of different model architectures and their learned representations.
Key Merged Components
The model's architecture is a complex merge of several influential base models, including:
- berkeley-nest/Starling-LM-7B-alpha
- openaccess-ai-collective/DPOpenHermes-7B
- fblgit/juanako-7b-UNA
- chargoddard/loyal-piano-m7
- argilla/notus-7b-v1
The merging process involved multiple stages, combining these models to achieve specific performance characteristics, particularly aiming for increased creativity compared to earlier versions. The "secret sauce" details the specific mergekit configurations, including slerp and passthrough methods, and parameter weighting for different layers.
Performance Benchmarks
Nyxene-v1-11B has been evaluated on the Open LLM Leaderboard, achieving an overall average score of 67.58. Notable scores include:
- AI2 Reasoning Challenge (25-Shot): 67.49
- HellaSwag (10-Shot): 84.52
- MMLU (5-Shot): 65.12
- TruthfulQA (0-shot): 57.28
- Winogrande (5-shot): 79.01
- GSM8k (5-shot): 52.08
Recommended Prompt Template
For optimal performance, the model is designed to be used with the following instruction-based prompt format:
<|system|>
Below is an instruction that describes a task. Write a response that appropriately completes the request.
<|user|>
{prompt}
<|assistant|>
Use Cases
This model is particularly suited for applications requiring creative text generation and general language understanding, benefiting from the diverse capabilities inherited from its merged components.