Name: HCY123902/llama-3-8b-inst-dpo-on-p-tw31-beta-2.5e-0-ift API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: HCY123902

Overview

This model, HCY123902/llama-3-8b-inst-dpo-on-p-tw31-beta-2.5e-0-ift, is an 8 billion parameter instruction-tuned language model. It is a fine-tuned variant of the meta-llama/Meta-Llama-3-8B-Instruct base model, developed by HCY123902.

Training Methodology

The model was trained using Direct Preference Optimization (DPO), a method detailed in the paper "Direct Preference Optimization: Your Language Model is Secretly a Reward Model." This technique aims to align the model's outputs with human preferences more effectively than traditional reinforcement learning from human feedback (RLHF) methods. The training process utilized the TRL library.

Key Characteristics

Base Model: Meta-Llama-3-8B-Instruct
Parameter Count: 8 billion
Context Length: 8192 tokens
Fine-tuning: Direct Preference Optimization (DPO)

Intended Use Cases

This model is suitable for various text generation tasks where high-quality, preference-aligned responses are desired. Its instruction-tuned nature makes it effective for following prompts and generating coherent, relevant text.

Overview

Overview

Training Methodology

Key Characteristics

Intended Use Cases

Full Model Card (README)