vicgalle/solarized-13B-dpo
vicgalle/solarized-13B-dpo is a 15 billion parameter DPO-tuned language model created by vicgalle. It is a frankenmerge model combining Nous-Hermes-2-SOLAR-10.7B and SOLAR-10.7B-Instruct, then fine-tuned with DPO on a high-quality preference dataset. This model excels in generating creative and nuanced text, as demonstrated by its sample generations. It achieves an average score of 62.05 on the Open LLM Leaderboard benchmarks, with a 59.12 MMLU score and 81.82 HellaSwag score.
Loading preview...
Model Overview
vicgalle/solarized-13B-dpo is a 15 billion parameter language model developed by vicgalle. It is a unique "frankenmerge" model, constructed by alternating layers from two distinct base models: Nous-Hermes-2-SOLAR-10.7B and SOLAR-10.7B-Instruct. Following this architectural merge, the model underwent further refinement through Direct Preference Optimization (DPO) using a high-quality preference dataset.
Key Capabilities & Performance
This model demonstrates strong performance across various benchmarks, achieving an average score of 62.05 on the Open LLM Leaderboard. Notable scores include:
- HellaSwag (10-Shot): 81.82
- MMLU (5-Shot): 59.12
- AI2 Reasoning Challenge (25-Shot): 62.71
- TruthfulQA (0-shot): 66.25
Its DPO fine-tuning on a preference dataset suggests an optimization for generating high-quality, aligned, and nuanced text outputs, as exemplified by its creative sample generations.
Potential Use Cases
- Creative Content Generation: Excels at generating imaginative and detailed narratives, as shown in its ability to produce a movie review with a specific, unconventional focus.
- Instruction Following: The DPO tuning implies improved adherence to user instructions and preferences.
- General Text Generation: Suitable for a wide range of text generation tasks where quality and coherence are important.