Name: kyujinpy/Sakura-SOLAR-Instruct-DPO-v2 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: kyujinpy

Sakura-SOLAR-Instruct-DPO-v2 Overview

Sakura-SOLAR-Instruct-DPO-v2 is a 10.7 billion parameter language model developed by Kyujin Han (kyujinpy) as part of an LLM research consortium with Media Group Saramgwasup and Marker. This model is an instruction-tuned variant, specifically enhanced using the Direct Preference Optimization (DPO) method.

Key Capabilities & Training

DPO Fine-tuning: The model was fine-tuned using the DPO method, leveraging the argilla/distilabel-math-preference-dpo dataset, which suggests an emphasis on mathematical reasoning and preference alignment.
Benchmark Performance: On the Open LLM Leaderboard, Sakura-SOLAR-Instruct-DPO-v2 achieves an average score of 74.14. Notable scores include:
- AI2 Reasoning Challenge (ARC): 70.90
- HellaSwag: 88.41
- MMLU: 66.48
- TruthfulQA: 71.86
- Winogrande: 83.43
- GSM8k: 63.76
Model Lineage: This version is an iteration following kyujinpy/Sakura-SOLAR-Instruct, showing slight improvements in some metrics like MMLU and GSM8k.

Usage

This model is suitable for tasks requiring instruction following and general language understanding, particularly benefiting from its DPO fine-tuning for improved response quality and alignment. Its performance across various benchmarks indicates its versatility for a range of NLP applications.

Overview

Sakura-SOLAR-Instruct-DPO-v2 Overview

Key Capabilities & Training

Usage

Full Model Card (README)