yanolja/Bookworm-10.7B-v0.4-DPO
yanolja/Bookworm-10.7B-v0.4-DPO is a 10.7 billion parameter language model fine-tuned by Yanolja using Direct Preference Optimization (DPO). It is based on yanolja/KoSOLAR-10.7B-v0.2, a Korean vocabulary-extended version of Upstage's SOLAR-10.7B-v1.0. This model specializes in generating high-quality responses in Korean, leveraging training on Korean-translated versions of Open-Orca/SlimOrca-Dedup and argilla/ultrafeedback-binarized-preferences-cleaned datasets. It is designed for applications requiring nuanced Korean language understanding and generation.
Loading preview...
Model Overview
yanolja/Bookworm-10.7B-v0.4-DPO is a 10.7 billion parameter language model developed by Yanolja, building upon the KoSOLAR-10.7B-v0.2 architecture, which itself is a Korean vocabulary-extended variant of Upstage's SOLAR-10.7B-v1.0. This model has undergone Direct Preference Optimization (DPO) using the LLaMA-Factory framework, enhancing its ability to generate preferred responses.
Key Capabilities
- Korean Language Proficiency: Optimized for understanding and generating high-quality text in Korean.
- Preference Alignment: Fine-tuned with DPO to align outputs with human preferences, leading to more desirable and coherent responses.
- Robust Foundation: Benefits from the strong base architecture of SOLAR-10.7B-v1.0 and its Korean vocabulary extension.
Training Data
The model's DPO training utilized Korean-translated versions of two key datasets:
- Open-Orca/SlimOrca-Dedup: A deduplicated subset of instruction-following data.
- argilla/ultrafeedback-binarized-preferences-cleaned: A dataset containing binarized human preferences for model responses.
Ideal Use Cases
This model is particularly well-suited for applications requiring advanced Korean language processing, such as:
- Korean-centric chatbots and conversational AI.
- Content generation in Korean.
- Tasks benefiting from preference-aligned outputs in a Korean context.