Name: kyujinpy/Sakura-SOLRCA-Math-Instruct-DPO-v2 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: kyujinpy

Model Overview

The kyujinpy/Sakura-SOLRCA-Math-Instruct-DPO-v2 is a 10.7 billion parameter instruction-tuned language model developed by Kyujin Han (kyujinpy). This model leverages the Direct Preference Optimization (DPO) method, fine-tuned on a combination of preference datasets, specifically Intel/orca_dpo_pairs and argilla/distilabel-math-preference-dpo. A merged version of these datasets, kyujinpy/orca_math_dpo, was utilized in its training.

Key Capabilities & Performance

This model is primarily designed for mathematical reasoning and general instruction following. Its performance is highlighted by its benchmark results on the Open LLM Leaderboard:

Average Score: 74.17%
GSM8K (5-shot): 63.91% (indicating strong mathematical problem-solving ability)
MMLU (5-shot): 66.13%
HellaSwag (10-shot): 88.52%
ARC (25-shot): 71.25%

The v2 iteration shows slight improvements over its predecessor, Sakura-SOLRCA-Math-Instruct-DPO-v1, particularly in overall average and GSM8K scores, demonstrating continuous refinement in its mathematical and reasoning capabilities.

When to Use This Model

This model is particularly well-suited for applications requiring:

Mathematical problem-solving and reasoning.
General instruction-following tasks where accuracy and logical coherence are important.
Research and development in DPO-based fine-tuning for specialized tasks.

Overview

Model Overview

Key Capabilities & Performance

When to Use This Model

Full Model Card (README)