kyujinpy/Sakura-SOLRCA-Math-Instruct-DPO-v2
The Sakura-SOLRCA-Math-Instruct-DPO-v2 is a 10.7 billion parameter instruction-tuned causal language model developed by Kyujin Han (kyujinpy). This model is fine-tuned using the DPO method on mathematical and reasoning datasets, including Intel/orca_dpo_pairs and argilla/distilabel-math-preference-dpo. It is optimized for mathematical reasoning and general instruction following, achieving a 63.91% score on GSM8K and a 74.17% average across various benchmarks.
Loading preview...
Model Overview
The kyujinpy/Sakura-SOLRCA-Math-Instruct-DPO-v2 is a 10.7 billion parameter instruction-tuned language model developed by Kyujin Han (kyujinpy). This model leverages the Direct Preference Optimization (DPO) method, fine-tuned on a combination of preference datasets, specifically Intel/orca_dpo_pairs and argilla/distilabel-math-preference-dpo. A merged version of these datasets, kyujinpy/orca_math_dpo, was utilized in its training.
Key Capabilities & Performance
This model is primarily designed for mathematical reasoning and general instruction following. Its performance is highlighted by its benchmark results on the Open LLM Leaderboard:
- Average Score: 74.17%
- GSM8K (5-shot): 63.91% (indicating strong mathematical problem-solving ability)
- MMLU (5-shot): 66.13%
- HellaSwag (10-shot): 88.52%
- ARC (25-shot): 71.25%
The v2 iteration shows slight improvements over its predecessor, Sakura-SOLRCA-Math-Instruct-DPO-v1, particularly in overall average and GSM8K scores, demonstrating continuous refinement in its mathematical and reasoning capabilities.
When to Use This Model
This model is particularly well-suited for applications requiring:
- Mathematical problem-solving and reasoning.
- General instruction-following tasks where accuracy and logical coherence are important.
- Research and development in DPO-based fine-tuning for specialized tasks.
Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.