Kukedlc/NeuTrixOmniBe-DPO
Kukedlc/NeuTrixOmniBe-DPO is a 7 billion parameter language model developed by Kukedlc, created by merging CultriX/NeuralTrix-7B-dpo and paulml/OmniBeagleSquaredMBX-v3-7B-v2 using LazyMergekit, and subsequently fine-tuned with Direct Preference Optimization (DPO) on the Intel/orca_dpo_pairs dataset. This model achieves an average score of 76.17 on the Open LLM Leaderboard, demonstrating strong performance across various benchmarks including HellaSwag (89.03) and Winogrande (85.16). It is designed for general language understanding and generation tasks, leveraging its merged architecture and DPO training for improved response quality.
Loading preview...
NeuTrixOmniBe-DPO Overview
NeuTrixOmniBe-DPO is a 7 billion parameter language model developed by Kukedlc. It was constructed through a strategic merge of two base models, CultriX/NeuralTrix-7B-dpo and paulml/OmniBeagleSquaredMBX-v3-7B-v2, utilizing the LazyMergekit tool. Following this merge, the model underwent further refinement through Direct Preference Optimization (DPO) using the Intel/orca_dpo_pairs dataset.
Key Capabilities & Performance
This model demonstrates solid performance across a range of benchmarks, achieving an average score of 76.17 on the Open LLM Leaderboard. Notable benchmark results include:
- HellaSwag (10-Shot): 89.03
- Winogrande (5-shot): 85.16
- TruthfulQA (0-shot): 77.21
- AI2 Reasoning Challenge (25-Shot): 72.78
- GSM8k (5-shot): 68.54
- MMLU (5-Shot): 64.28
Training & Architecture
The model's architecture is a result of a slerp merge, combining layers from its base models. The DPO training phase aims to align the model's outputs more closely with human preferences, enhancing its ability to generate high-quality and relevant responses. The development process was informed by knowledge gained from Maxime Labonne's LLM course.
Important Note
As indicated in the original README, there is a known "Bug INSTINST in response" which suggests potential issues in its output generation that users should be aware of.
Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.