etri-xainlp/SOLAR-10.7B-sft-dpo-v1

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:10.7BQuant:FP8Ctx Length:4kLicense:cc-by-nc-4.0Architecture:Transformer0.0K Open Weights Warm

The etri-xainlp/SOLAR-10.7B-sft-dpo-v1 is a 10.7 billion parameter language model developed by the ETRI xainlp team, built upon the davidkim205/nox-solar-10.7b-v4 base model. It has been fine-tuned using supervised fine-tuning (SFT) on a 1.8 million instruction-following dataset and further optimized with Direct Preference Optimization (DPO) on a 221k user preference dataset. This model is designed for generating text outputs based on text inputs, leveraging its specialized training for improved instruction adherence and user preference alignment.

Loading preview...

Model Overview

etri-xainlp/SOLAR-10.7B-sft-dpo-v1 is a 10.7 billion parameter language model developed by the ETRI xainlp team. It is based on the davidkim205/nox-solar-10.7b-v4 architecture and processes text inputs to generate text outputs.

Key Training Details

This model underwent a two-stage fine-tuning process:

  • Supervised Fine-Tuning (SFT): The model was initially fine-tuned using a LoRA (Low-Rank Adaptation) approach on a substantial dataset comprising 1,821,734 instruction-following examples. This stage aims to enhance the model's ability to follow instructions and generate coherent responses.
  • Direct Preference Optimization (DPO): Following SFT, the model was further optimized using DPO, also with LoRA, on a dataset of 221,869 user preference examples. This stage refines the model's outputs to better align with human preferences and quality judgments.

Training was conducted using A100 GPU 80GB * 8, indicating a significant computational investment to achieve its current performance. The combination of SFT and DPO aims to produce a model that is both capable of following complex instructions and generating outputs preferred by users.

Popular Sampler Settings

Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.

temperature
top_p
top_k
frequency_penalty
presence_penalty
repetition_penalty
min_p