macadeliccc/SOLAR-10.7b-Instruct-truthy-dpo
macadeliccc/SOLAR-10.7b-Instruct-truthy-dpo is a 10.7 billion parameter instruction-tuned language model, fine-tuned by macadeliccc from upstageai/Solar-10.7b-Instruct-v0.1. This model was further refined using DPO on Intel/orca_dpo_pairs and jondurbin/truthy-dpo-v0.1 datasets, aiming to enhance truthfulness and instruction following. It achieves an average score of 61.26% across AGIEval, GPT4All, TruthfulQA, and Bigbench, making it suitable for general conversational AI and tasks requiring factual accuracy.
Loading preview...
Model Overview
macadeliccc/SOLAR-10.7b-Instruct-truthy-dpo is a 10.7 billion parameter instruction-tuned language model, developed by macadeliccc. It is a fine-tuned version of upstageai/Solar-10.7b-Instruct-v0.1, enhanced through a two-step DPO (Direct Preference Optimization) process.
Training Process
- Initial Fine-tuning: The base model, upstageai/Solar-10.7b-Instruct-v0.1, was fine-tuned for one epoch using the Intel/orca_dpo_pairs dataset, which contains 12.4k samples.
- Further Refinement: This intermediate model was then further fine-tuned for three epochs with the jondurbin/truthy-dpo-v0.1 dataset, comprising 1.04k samples. This experimental process aims to improve the model's truthfulness and adherence to instructions.
Performance & Benchmarks
The model has been evaluated across several benchmarks, demonstrating its capabilities in various domains:
- Overall Average Score: 61.26% across AGIEval, GPT4All, TruthfulQA, and Bigbench.
- TruthfulQA: Achieved an average of 76.81%, indicating a focus on factual accuracy.
- GPT4All: Scored 73.82% on average, with strong performance on tasks like BoolQ (88.20%) and HellaSwag (86.39% acc_norm).
- Open LLM Leaderboard: Achieved an average score of 74.11, with specific metrics including MMLU (65.45%) and TruthfulQA (76.75%).
Use Cases
This model is suitable for applications requiring instruction-following and a focus on generating truthful responses. Its performance on benchmarks suggests utility in general conversational AI, question answering, and tasks where factual correctness is prioritized.
Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.