Name: etri-xainlp/llama2-12.8b_lora-dpo_v1 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: etri-xainlp

Model Overview

The etri-xainlp/llama2-12.8b_lora-dpo_v1 is a language model developed by the ETRI xainlp team. It is built upon the meta-llama/Llama-2-13b-hf base model, indicating a foundation on a robust 13 billion parameter architecture. This model has undergone a specialized fine-tuning process using LoRA (Low-Rank Adaptation), which allows for efficient adaptation of large language models.

Key Training Details

The training regimen for this model involved two distinct phases:

Supervised Fine-Tuning (SFT) with LoRA: The model was initially fine-tuned on a substantial dataset of 710,000 instruction-following examples. This phase aims to imbue the model with the ability to understand and execute various instructions.
Direct Preference Optimization (DPO) with LoRA: Following SFT, the model was further optimized using DPO on a dataset of 90,000 user preference examples. This DPO phase is crucial for aligning the model's outputs more closely with human preferences and improving response quality.

Training was conducted using powerful hardware, specifically A100 GPU 80GB * 8, ensuring efficient processing of the large datasets.

Capabilities and Use Cases

This model is designed to process text-only inputs and generate text-only outputs. Its fine-tuning on instruction-following and user preference datasets suggests its suitability for applications where generating responses that adhere to specific instructions and align with desired user preferences is critical. Potential use cases include:

Instruction-based text generation
Dialogue systems requiring preference alignment
Content creation with specific stylistic or factual constraints

Overview

Model Overview

Key Training Details

Capabilities and Use Cases

Full Model Card (README)