Name: macadeliccc/SOLAR-10.7b-Instruct-truthy-dpo API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: macadeliccc

Model Overview

macadeliccc/SOLAR-10.7b-Instruct-truthy-dpo is a 10.7 billion parameter instruction-tuned language model, developed by macadeliccc. It is a fine-tuned version of upstageai/Solar-10.7b-Instruct-v0.1, enhanced through a two-step DPO (Direct Preference Optimization) process.

Training Process

Initial Fine-tuning: The base model, upstageai/Solar-10.7b-Instruct-v0.1, was fine-tuned for one epoch using the Intel/orca_dpo_pairs dataset, which contains 12.4k samples.
Further Refinement: This intermediate model was then further fine-tuned for three epochs with the jondurbin/truthy-dpo-v0.1 dataset, comprising 1.04k samples. This experimental process aims to improve the model's truthfulness and adherence to instructions.

Performance & Benchmarks

The model has been evaluated across several benchmarks, demonstrating its capabilities in various domains:

Overall Average Score: 61.26% across AGIEval, GPT4All, TruthfulQA, and Bigbench.
TruthfulQA: Achieved an average of 76.81%, indicating a focus on factual accuracy.
GPT4All: Scored 73.82% on average, with strong performance on tasks like BoolQ (88.20%) and HellaSwag (86.39% acc_norm).
Open LLM Leaderboard: Achieved an average score of 74.11, with specific metrics including MMLU (65.45%) and TruthfulQA (76.75%).

Use Cases

This model is suitable for applications requiring instruction-following and a focus on generating truthful responses. Its performance on benchmarks suggests utility in general conversational AI, question answering, and tasks where factual correctness is prioritized.

Overview

Model Overview

Training Process

Performance & Benchmarks

Use Cases

Full Model Card (README)