etri-xainlp/llama3-8b-dpo_v1
The etri-xainlp/llama3-8b-dpo_v1 model is an 8 billion parameter language model developed by the ETRI xainlp team, based on Meta's Llama-3-8B architecture. It has been fine-tuned using a combination of supervised fine-tuning (SFT) and Direct Preference Optimization (DPO) with LoRA, utilizing 1.8 million instruction-following examples and 221,000 user preference examples. This model is designed for text-only input and output, focusing on improved instruction following and alignment with user preferences.
Loading preview...
Model Overview
etri-xainlp/llama3-8b-dpo_v1 is an 8 billion parameter language model developed by the ETRI xainlp team. It is built upon the robust Meta-Llama-3-8B base model, enhancing its capabilities through a specialized fine-tuning process.
Key Capabilities
- Instruction Following: The model has undergone supervised fine-tuning (SFT) with LoRA using a substantial dataset of 1,821,000 instruction-following examples, significantly improving its ability to understand and execute user commands.
- Preference Alignment: Further refinement was achieved through Direct Preference Optimization (DPO) with LoRA, utilizing 221,000 user preference examples. This process aligns the model's outputs more closely with human preferences and desired behaviors.
- Text-to-Text Generation: Designed for text-only input and output, making it suitable for a wide range of natural language processing tasks.
Training Details
The fine-tuning process involved a two-stage approach:
- SFT + LoRA: Initial training on a large instruction-following dataset.
- DPO + LoRA: Subsequent training on a user preference dataset to enhance alignment.
Training was conducted using 8 A100 GPUs with 80GB memory each, indicating a significant computational investment to achieve its current performance.
Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.