macadeliccc/SOLAR-10.7b-Instruct-truthy-dpo

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:10.7BQuant:FP8Ctx Length:4kPublished:Jan 25, 2024License:ccArchitecture:Transformer0.0K Warm

macadeliccc/SOLAR-10.7b-Instruct-truthy-dpo is a 10.7 billion parameter instruction-tuned language model, fine-tuned by macadeliccc from upstageai/Solar-10.7b-Instruct-v0.1. This model was further refined using DPO on Intel/orca_dpo_pairs and jondurbin/truthy-dpo-v0.1 datasets, aiming to enhance truthfulness and instruction following. It achieves an average score of 61.26% across AGIEval, GPT4All, TruthfulQA, and Bigbench, making it suitable for general conversational AI and tasks requiring factual accuracy.

Loading preview...

Model Overview

macadeliccc/SOLAR-10.7b-Instruct-truthy-dpo is a 10.7 billion parameter instruction-tuned language model, developed by macadeliccc. It is a fine-tuned version of upstageai/Solar-10.7b-Instruct-v0.1, enhanced through a two-step DPO (Direct Preference Optimization) process.

Training Process

  1. Initial Fine-tuning: The base model, upstageai/Solar-10.7b-Instruct-v0.1, was fine-tuned for one epoch using the Intel/orca_dpo_pairs dataset, which contains 12.4k samples.
  2. Further Refinement: This intermediate model was then further fine-tuned for three epochs with the jondurbin/truthy-dpo-v0.1 dataset, comprising 1.04k samples. This experimental process aims to improve the model's truthfulness and adherence to instructions.

Performance & Benchmarks

The model has been evaluated across several benchmarks, demonstrating its capabilities in various domains:

  • Overall Average Score: 61.26% across AGIEval, GPT4All, TruthfulQA, and Bigbench.
  • TruthfulQA: Achieved an average of 76.81%, indicating a focus on factual accuracy.
  • GPT4All: Scored 73.82% on average, with strong performance on tasks like BoolQ (88.20%) and HellaSwag (86.39% acc_norm).
  • Open LLM Leaderboard: Achieved an average score of 74.11, with specific metrics including MMLU (65.45%) and TruthfulQA (76.75%).

Use Cases

This model is suitable for applications requiring instruction-following and a focus on generating truthful responses. Its performance on benchmarks suggests utility in general conversational AI, question answering, and tasks where factual correctness is prioritized.

Popular Sampler Settings

Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.

temperature
top_p
top_k
frequency_penalty
presence_penalty
repetition_penalty
min_p