Name: W-61/llama3-8b-dpo-4xh100-pilot API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: W-61

Model Overview

W-61/llama3-8b-dpo-4xh100-pilot is an 8 billion parameter language model, fine-tuned from the princeton-nlp/Llama-3-Base-8B-SFT base model. It has been specifically trained using Direct Preference Optimization (DPO), a method detailed in the paper "Direct Preference Optimization: Your Language Model is Secretly a Reward Model". This training approach aims to align the model's outputs more closely with human preferences.

Key Capabilities

Preference-aligned text generation: Benefits from DPO training to produce outputs that are generally preferred by humans.
Base model: Built upon the Llama-3 architecture, providing a strong foundation for various NLP tasks.
TRL framework: Developed using the Transformer Reinforcement Learning (TRL) library, indicating a focus on advanced fine-tuning techniques.

Training Details

The model's training procedure involved DPO, leveraging TRL version 0.19.1, Transformers 4.57.6, Pytorch 2.6.0+cu126, Datasets 4.8.4, and Tokenizers 0.22.2. The training process can be visualized via Weights & Biases, as indicated in the original model card.

Good For

Applications requiring text generation with improved human preference alignment.
Further experimentation with DPO-trained Llama-3 models.
General-purpose conversational AI and content creation where nuanced responses are valued.

Overview

Model Overview

Key Capabilities

Training Details

Good For

Full Model Card (README)