etri-xainlp/llama2-13b-sft-dpo

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:13BQuant:FP8Ctx Length:4kLicense:apache-2.0Architecture:Transformer Open Weights Warm

etri-xainlp/llama2-13b-sft-dpo is a 13 billion parameter language model developed by the ETRI xainlp team, built upon the Llama-2-13b-hf architecture. This model has undergone supervised fine-tuning (SFT) on 650k instruction-following examples and further optimized with Direct Preference Optimization (DPO) using 90k user preference sets. It is designed for text-only input and output, excelling in tasks requiring adherence to instructions and user preferences.

Loading preview...

Overview

etri-xainlp/llama2-13b-sft-dpo is a 13 billion parameter language model developed by the ETRI xainlp team. It is based on the meta-llama/Llama-2-13b-hf architecture, indicating a strong foundation from Meta's Llama 2 series. The model processes text-only input and generates text-only output.

Key Capabilities

  • Instruction Following: The model has been extensively fine-tuned using a supervised fine-tuning (SFT) dataset of 650,000 instruction-following examples, enhancing its ability to understand and execute given instructions.
  • Preference Alignment: Further optimization was performed using Direct Preference Optimization (DPO) on a dataset of 90,000 user preference sets, which helps align the model's responses more closely with human preferences.
  • Training Infrastructure: Training was conducted using A100 GPU 80GB * 8, suggesting a robust training process for performance and stability.

Good For

  • Applications requiring a model that can accurately follow complex instructions.
  • Use cases where aligning model output with specific user preferences is crucial.
  • General text generation tasks where a Llama 2-based 13B model with SFT and DPO enhancements is beneficial.

Popular Sampler Settings

Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.

temperature
top_p
top_k
frequency_penalty
presence_penalty
repetition_penalty
min_p