SJ-Donald/SJ-SOLAR-10.7b-DPO

TEXT GENERATIONConcurrency Cost:1Model Size:15BQuant:FP8Ctx Length:8kTool Calling:SupportedPublished:Jan 25, 2024License:cc-by-nc-4.0Architecture:Transformer Open Weights Cold

SJ-Donald/SJ-SOLAR-10.7b-DPO is a 10.7 billion parameter causal language model developed by SJ-Donald, fine-tuned using the DPO method. Built upon the SOLAR-10.7B-slerp base model, it demonstrates strong performance across various benchmarks, including an average of 72.67 on the Open LLM Leaderboard and 56.93 on the open-ko-llm-leaderboard for Korean language tasks. This model is optimized for general language understanding and generation, with a particular focus on improving response quality through DPO fine-tuning.

Loading preview...

Overview

SJ-Donald/SJ-SOLAR-10.7b-DPO is a 10.7 billion parameter language model developed by SJ-Donald, fine-tuned using the Direct Preference Optimization (DPO) method. It is built upon the SJ-Donald/SOLAR-10.7B-slerp base model and was trained using the SJ-Donald/orca-dpo-pairs-ko dataset, indicating a focus on high-quality, preference-aligned responses, particularly for Korean language contexts.

Key Capabilities & Performance

This model demonstrates competitive performance on standard benchmarks:

  • Open-LLM-Leaderboard: Achieves an average score of 72.67, with strong results in HellaSwag (86.95) and Winogrande (84.21).
  • open-ko-llm-leaderboard: Scores an average of 56.93 on Korean-specific tasks, including Ko-HellaSwag (61.99) and Ko-CommonGen V2 (58.44).

Use Cases

SJ-Donald/SJ-SOLAR-10.7b-DPO is suitable for applications requiring robust language generation and understanding, especially where DPO-aligned responses are beneficial. Its performance on Korean benchmarks suggests particular utility for Korean language processing tasks. Developers can easily integrate it using the provided Hugging Face Transformers code snippet, and a GGUF version is also available for efficient deployment.