kyujinpy/Sakura-SOLRCA-Instruct-DPO

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:10.7BQuant:FP8Ctx Length:4kPublished:Dec 24, 2023License:cc-by-nc-sa-4.0Architecture:Transformer Open Weights Warm

Sakura-SOLRCA-Instruct-DPO is a 10.7 billion parameter instruction-tuned causal language model developed by Kyujin Han and the LLM research consortium of Media Group Saramgwasup and Marker. This model, fine-tuned using the DPO method on the Intel/orca_dpo_pairs dataset, demonstrates strong performance across various benchmarks, achieving an average score of 74.05 on the Open LLM Leaderboard. It is designed for general-purpose instruction following and reasoning tasks, offering competitive capabilities for its size.

Loading preview...

Model Overview

Sakura-SOLRCA-Instruct-DPO is a 10.7 billion parameter instruction-tuned language model developed by Kyujin Han (kyujinpy) in collaboration with the LLM research consortium of Media Group Saramgwasup and Marker. This model was fine-tuned using the Direct Preference Optimization (DPO) method, leveraging the high-quality Intel/orca_dpo_pairs dataset to enhance its instruction-following capabilities.

Key Capabilities & Performance

The model exhibits robust performance across a range of benchmarks, as evaluated on the Hugging Face Open LLM Leaderboard. It achieved an average score of 74.05, with notable results in:

  • AI2 Reasoning Challenge (ARC): 71.16
  • HellaSwag: 88.49
  • MMLU: 66.17
  • TruthfulQA: 72.10
  • Winogrande: 82.95
  • GSM8K: 63.46

These scores indicate strong general reasoning, common sense, and instruction-following abilities. The model's development details, including training and code, are openly shared in the Sakura-SOLAR GitHub repository.

When to Use This Model

Sakura-SOLRCA-Instruct-DPO is suitable for applications requiring a capable instruction-following model of its size. Its balanced performance across various benchmarks makes it a strong candidate for:

  • General-purpose conversational AI.
  • Reasoning and question-answering tasks.
  • Applications where a 10.7B parameter model offers a good balance between performance and computational efficiency.

Popular Sampler Settings

Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.

temperature
top_p
top_k
frequency_penalty
presence_penalty
repetition_penalty
min_p