etri-xainlp/llama2-12.8b_lora-dpo_v1

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:13BQuant:FP8Ctx Length:4kLicense:apache-2.0Architecture:Transformer Open Weights Warm

The etri-xainlp/llama2-12.8b_lora-dpo_v1 is a Llama-2-13b-hf based language model developed by the ETRI xainlp team. This model has been fine-tuned using LoRA with both instruction-following (710k examples) and user preference (90k examples) datasets, leveraging Direct Preference Optimization (DPO). It is designed to generate text outputs based on text inputs, making it suitable for tasks requiring adherence to instructions and user preferences.

Loading preview...

Model Overview

The etri-xainlp/llama2-12.8b_lora-dpo_v1 is a language model developed by the ETRI xainlp team. It is built upon the meta-llama/Llama-2-13b-hf base model, indicating a foundation on a robust 13 billion parameter architecture. This model has undergone a specialized fine-tuning process using LoRA (Low-Rank Adaptation), which allows for efficient adaptation of large language models.

Key Training Details

The training regimen for this model involved two distinct phases:

  • Supervised Fine-Tuning (SFT) with LoRA: The model was initially fine-tuned on a substantial dataset of 710,000 instruction-following examples. This phase aims to imbue the model with the ability to understand and execute various instructions.
  • Direct Preference Optimization (DPO) with LoRA: Following SFT, the model was further optimized using DPO on a dataset of 90,000 user preference examples. This DPO phase is crucial for aligning the model's outputs more closely with human preferences and improving response quality.

Training was conducted using powerful hardware, specifically A100 GPU 80GB * 8, ensuring efficient processing of the large datasets.

Capabilities and Use Cases

This model is designed to process text-only inputs and generate text-only outputs. Its fine-tuning on instruction-following and user preference datasets suggests its suitability for applications where generating responses that adhere to specific instructions and align with desired user preferences is critical. Potential use cases include:

  • Instruction-based text generation
  • Dialogue systems requiring preference alignment
  • Content creation with specific stylistic or factual constraints

Popular Sampler Settings

Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.

temperature
top_p
top_k
frequency_penalty
presence_penalty
repetition_penalty
min_p