beberik/Nyxene-v1-11B

TEXT GENERATIONConcurrency Cost:1Model Size:10.7BQuant:FP8Ctx Length:4kPublished:Dec 4, 2023License:cc-by-nc-4.0Architecture:Transformer Open Weights Cold

beberik/Nyxene-v1-11B is a 10.7 billion parameter language model created by beberik, built using a sophisticated merge of several 7B-class models including Starling-LM-7B-alpha and DPOpenHermes-7B. This model is specifically designed to enhance creative text generation, leveraging a unique merging strategy that combines different base models. It achieves an average score of 67.58 on the Open LLM Leaderboard, demonstrating capabilities across various reasoning and language understanding tasks.

Loading preview...

Nyxene-v1-11B: A Merged Language Model for Enhanced Creativity

Nyxene-v1-11B is a 10.7 billion parameter language model developed by beberik, representing an iteration on previous merging experiments. This model is constructed by combining several 7B-class base models using mergekit, a technique that allows for the strategic integration of different model architectures and their learned representations.

Key Merged Components

The model's architecture is a complex merge of several influential base models, including:

  • berkeley-nest/Starling-LM-7B-alpha
  • openaccess-ai-collective/DPOpenHermes-7B
  • fblgit/juanako-7b-UNA
  • chargoddard/loyal-piano-m7
  • argilla/notus-7b-v1

The merging process involved multiple stages, combining these models to achieve specific performance characteristics, particularly aiming for increased creativity compared to earlier versions. The "secret sauce" details the specific mergekit configurations, including slerp and passthrough methods, and parameter weighting for different layers.

Performance Benchmarks

Nyxene-v1-11B has been evaluated on the Open LLM Leaderboard, achieving an overall average score of 67.58. Notable scores include:

  • AI2 Reasoning Challenge (25-Shot): 67.49
  • HellaSwag (10-Shot): 84.52
  • MMLU (5-Shot): 65.12
  • TruthfulQA (0-shot): 57.28
  • Winogrande (5-shot): 79.01
  • GSM8k (5-shot): 52.08

Recommended Prompt Template

For optimal performance, the model is designed to be used with the following instruction-based prompt format:

<|system|>
Below is an instruction that describes a task. Write a response that appropriately completes the request.
<|user|>
{prompt}
<|assistant|>

Use Cases

This model is particularly suited for applications requiring creative text generation and general language understanding, benefiting from the diverse capabilities inherited from its merged components.