kyujinpy/Sakura-SOLAR-Instruct-DPO-v2

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:10.7BQuant:FP8Ctx Length:4kPublished:Dec 24, 2023License:cc-by-nc-sa-4.0Architecture:Transformer0.0K Open Weights Warm

The kyujinpy/Sakura-SOLAR-Instruct-DPO-v2 is a 10.7 billion parameter instruction-tuned causal language model developed by Kyujin Han (kyujinpy) in collaboration with Media Group Saramgwasup and Marker. This model was fine-tuned using the DPO method with the argilla/distilabel-math-preference-dpo dataset. It achieves an average score of 74.14 on the Open LLM Leaderboard, demonstrating strong performance across various reasoning and language understanding benchmarks.

Loading preview...

Sakura-SOLAR-Instruct-DPO-v2 Overview

Sakura-SOLAR-Instruct-DPO-v2 is a 10.7 billion parameter language model developed by Kyujin Han (kyujinpy) as part of an LLM research consortium with Media Group Saramgwasup and Marker. This model is an instruction-tuned variant, specifically enhanced using the Direct Preference Optimization (DPO) method.

Key Capabilities & Training

  • DPO Fine-tuning: The model was fine-tuned using the DPO method, leveraging the argilla/distilabel-math-preference-dpo dataset, which suggests an emphasis on mathematical reasoning and preference alignment.
  • Benchmark Performance: On the Open LLM Leaderboard, Sakura-SOLAR-Instruct-DPO-v2 achieves an average score of 74.14. Notable scores include:
    • AI2 Reasoning Challenge (ARC): 70.90
    • HellaSwag: 88.41
    • MMLU: 66.48
    • TruthfulQA: 71.86
    • Winogrande: 83.43
    • GSM8k: 63.76
  • Model Lineage: This version is an iteration following kyujinpy/Sakura-SOLAR-Instruct, showing slight improvements in some metrics like MMLU and GSM8k.

Usage

This model is suitable for tasks requiring instruction following and general language understanding, particularly benefiting from its DPO fine-tuning for improved response quality and alignment. Its performance across various benchmarks indicates its versatility for a range of NLP applications.

Popular Sampler Settings

Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.

temperature
top_p
top_k
frequency_penalty
presence_penalty
repetition_penalty
min_p