dddsaty/SOLAR_Merge_Adapter_DPO_Orca
Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:10.7BQuant:FP8Ctx Length:4kPublished:Feb 5, 2024License:cc-by-nc-4.0Architecture:Transformer Open Weights Warm

The dddsaty/SOLAR_Merge_Adapter_DPO_Orca is a 10.7 billion parameter language model created by dddsaty, built by merging two base SOLAR models and applying DPO fine-tuning. This model leverages the strengths of upstage/SOLAR-10.7B-Instruct-v1.0 and beomi/OPEN-SOLAR-KO-10.7B, further refined using the Intel/orca_dpo_pairs dataset. It demonstrates competitive performance across various benchmarks, including ARC, HellaSwag, MMLU, and GSM8K, making it suitable for general-purpose instruction-following tasks.

Loading preview...

Model Overview

The dddsaty/SOLAR_Merge_Adapter_DPO_Orca is a 10.7 billion parameter language model developed by dddsaty. It is constructed through a multi-stage process:

  • Base Model Merging: Two base models, upstage/SOLAR-10.7B-Instruct-v1.0 and beomi/OPEN-SOLAR-KO-10.7B, were merged using the mergekit (slerp) method.
  • DPO Fine-tuning: The merged model then underwent Direct Preference Optimization (DPO) using the Intel/orca_dpo_pairs training corpus. Only the adapter part was saved during this stage.
  • Final Merge: The DPO adapter was subsequently merged back into the base merged model to create the final version.

Performance Benchmarks

This model exhibits solid performance across a range of academic benchmarks, with an average score of 65.96. Key scores include:

  • ARC: 63.91
  • HellaSwag: 84.58
  • MMLU: 63.18
  • TruthfulQA: 51.49
  • Winogrande: 82
  • GSM8K: 50.57

Intended Use Cases

Given its architecture and DPO fine-tuning, this model is well-suited for general instruction-following tasks, leveraging the combined strengths of its base models and the preference alignment from the Orca DPO dataset. Its 4096-token context length supports a variety of conversational and text generation applications.

Popular Sampler Settings

Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.

temperature
top_p
top_k
frequency_penalty
presence_penalty
repetition_penalty
min_p