The kyujinpy/Sakura-SOLAR-Instruct-DPO-v2 is a 10.7 billion parameter instruction-tuned causal language model developed by Kyujin Han (kyujinpy) in collaboration with Media Group Saramgwasup and Marker. This model was fine-tuned using the DPO method with the argilla/distilabel-math-preference-dpo dataset. It achieves an average score of 74.14 on the Open LLM Leaderboard, demonstrating strong performance across various reasoning and language understanding benchmarks.
No reviews yet. Be the first to review!