kyujinpy/Sakura-SOLRCA-Math-Instruct-DPO-v1
TEXT GENERATIONConcurrency Cost:1Model Size:10.7BQuant:FP8Ctx Length:4kPublished:Dec 25, 2023License:cc-by-nc-sa-4.0Architecture:Transformer Open Weights Warm

The Sakura-SOLRCA-Math-Instruct-DPO-v1 is a 10.7 billion parameter instruction-tuned causal language model developed by Kyujin Han (kyujinpy). It was fine-tuned using the DPO method on a combination of Intel/orca_dpo_pairs and argilla/distilabel-math-preference-dpo datasets. This model is specifically optimized for mathematical reasoning and general instruction following, achieving a 63.84 score on GSM8K and a 74.13 average on the Open LLM Leaderboard.

Loading preview...

Popular Sampler Settings

Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.

temperature
top_p
top_k
frequency_penalty
presence_penalty
repetition_penalty
min_p