kyujinpy/Sakura-SOLRCA-Math-Instruct-DPO-v1
The Sakura-SOLRCA-Math-Instruct-DPO-v1 is a 10.7 billion parameter instruction-tuned causal language model developed by Kyujin Han (kyujinpy). It was fine-tuned using the DPO method on a combination of Intel/orca_dpo_pairs and argilla/distilabel-math-preference-dpo datasets. This model is specifically optimized for mathematical reasoning and general instruction following, achieving a 63.84 score on GSM8K and a 74.13 average on the Open LLM Leaderboard.
Loading preview...
Overview
The kyujinpy/Sakura-SOLRCA-Math-Instruct-DPO-v1 is a 10.7 billion parameter instruction-tuned language model developed by Kyujin Han (kyujinpy). This model leverages the Direct Preference Optimization (DPO) method, fine-tuned on a combination of the Intel/orca_dpo_pairs and argilla/distilabel-math-preference-dpo datasets. A merged version of these datasets, kyujinpy/orca_math_dpo, was also utilized in its development.
Key Capabilities
- Mathematical Reasoning: Demonstrates strong performance in mathematical problem-solving, as indicated by its 63.84 score on the GSM8K benchmark.
- Instruction Following: Designed to accurately follow instructions across a range of tasks, achieving an average score of 74.13 on the Open LLM Leaderboard.
- General Language Understanding: Exhibits solid performance on various benchmarks including ARC (71.25), HellaSwag (88.48), MMLU (66.21), TruthfulQA (72.12), and Winogrande (82.87).
Good For
- Applications requiring robust mathematical problem-solving capabilities.
- General-purpose instruction-following tasks where accuracy is critical.
- Developers looking for a DPO-tuned model with competitive benchmark performance in its size class.
Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.