etri-xainlp/SOLAR-10.7B-sft-dpo-v1
The etri-xainlp/SOLAR-10.7B-sft-dpo-v1 is a 10.7 billion parameter language model developed by the ETRI xainlp team, built upon the davidkim205/nox-solar-10.7b-v4 base model. It has been fine-tuned using supervised fine-tuning (SFT) on a 1.8 million instruction-following dataset and further optimized with Direct Preference Optimization (DPO) on a 221k user preference dataset. This model is designed for generating text outputs based on text inputs, leveraging its specialized training for improved instruction adherence and user preference alignment.
Loading preview...
Model Overview
etri-xainlp/SOLAR-10.7B-sft-dpo-v1 is a 10.7 billion parameter language model developed by the ETRI xainlp team. It is based on the davidkim205/nox-solar-10.7b-v4 architecture and processes text inputs to generate text outputs.
Key Training Details
This model underwent a two-stage fine-tuning process:
- Supervised Fine-Tuning (SFT): The model was initially fine-tuned using a LoRA (Low-Rank Adaptation) approach on a substantial dataset comprising 1,821,734 instruction-following examples. This stage aims to enhance the model's ability to follow instructions and generate coherent responses.
- Direct Preference Optimization (DPO): Following SFT, the model was further optimized using DPO, also with LoRA, on a dataset of 221,869 user preference examples. This stage refines the model's outputs to better align with human preferences and quality judgments.
Training was conducted using A100 GPU 80GB * 8, indicating a significant computational investment to achieve its current performance. The combination of SFT and DPO aims to produce a model that is both capable of following complex instructions and generating outputs preferred by users.
Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.