Name: vukien2301/llama-3.1-8b-ultrafeedback-dpo-from-epoch1 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: vukien2301

Model Overview

The vukien2301/llama-3.1-8b-ultrafeedback-dpo-from-epoch1 is an 8 billion parameter language model, fine-tuned using Direct Preference Optimization (DPO). It is built upon a Llama 3.2 base architecture and leverages the pvdhihihi/ultra-feedback dataset for its DPO training.

Key Training Details

Base Model: Derived from /home/minchan.kwon/ADPA/model/llama3.2-1b-deita-dpomix/ref_teacher_3epochs/checkpoint-191.
Fine-tuning Method: Direct Preference Optimization (DPO).
Dataset: pvdhihihi/ultra-feedback.
Epochs: Trained for 1 epoch.
Learning Rate: 7e-07.
Batch Size: A train_batch_size of 32 and eval_batch_size of 8, with a total_train_batch_size of 256 across 8 GPUs.
Optimizer: AdamW with default betas and epsilon.
Context Length: Supports a context length of 32768 tokens.

Intended Use

This model is primarily intended for applications where alignment with human preferences, as learned through DPO from feedback datasets, is crucial. Its DPO fine-tuning suggests suitability for tasks requiring nuanced response generation and adherence to preferred conversational styles or content quality.

Overview

Model Overview

Key Training Details

Intended Use

Full Model Card (README)