etri-xainlp/llama3-8b-dpo_v1

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:8kLicense:apache-2.0Architecture:Transformer Open Weights Warm

The etri-xainlp/llama3-8b-dpo_v1 model is an 8 billion parameter language model developed by the ETRI xainlp team, based on Meta's Llama-3-8B architecture. It has been fine-tuned using a combination of supervised fine-tuning (SFT) and Direct Preference Optimization (DPO) with LoRA, utilizing 1.8 million instruction-following examples and 221,000 user preference examples. This model is designed for text-only input and output, focusing on improved instruction following and alignment with user preferences.

Loading preview...

Model Overview

etri-xainlp/llama3-8b-dpo_v1 is an 8 billion parameter language model developed by the ETRI xainlp team. It is built upon the robust Meta-Llama-3-8B base model, enhancing its capabilities through a specialized fine-tuning process.

Key Capabilities

  • Instruction Following: The model has undergone supervised fine-tuning (SFT) with LoRA using a substantial dataset of 1,821,000 instruction-following examples, significantly improving its ability to understand and execute user commands.
  • Preference Alignment: Further refinement was achieved through Direct Preference Optimization (DPO) with LoRA, utilizing 221,000 user preference examples. This process aligns the model's outputs more closely with human preferences and desired behaviors.
  • Text-to-Text Generation: Designed for text-only input and output, making it suitable for a wide range of natural language processing tasks.

Training Details

The fine-tuning process involved a two-stage approach:

  1. SFT + LoRA: Initial training on a large instruction-following dataset.
  2. DPO + LoRA: Subsequent training on a user preference dataset to enhance alignment.

Training was conducted using 8 A100 GPUs with 80GB memory each, indicating a significant computational investment to achieve its current performance.

Popular Sampler Settings

Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.

temperature
top_p
top_k
frequency_penalty
presence_penalty
repetition_penalty
min_p