Name: W-61/llama-3-8b-base-new-dpo-ultrafeedback-4xh200-batch-128-q_t-0.45-s_star-0.45-20260427-221551 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: W-61

Model Overview

This model, developed by W-61, is an 8 billion parameter language model derived from a Llama 3 base architecture. It has undergone a specific fine-tuning process using Direct Preference Optimization (DPO) on the HuggingFaceH4/ultrafeedback_binarized dataset.

Key Characteristics

Base Model: Fine-tuned from W-61/llama-3-8b-base-sft-ultrachat-8xh200.
Optimization Method: Utilizes Direct Preference Optimization (DPO) for alignment.
Training Data: Optimized on the HuggingFaceH4/ultrafeedback_binarized dataset.
Evaluation Metrics: Achieved a loss of 0.5654 and specific DPO-related metrics on its evaluation set, including a margin mean of 76.3970.

Training Details

The model was trained with a learning rate of 5e-07, a total batch size of 128, and for 1 epoch. The optimizer used was ADAMW_TORCH with a cosine learning rate scheduler.

Intended Use Cases

While specific intended uses are not detailed in the provided information, models fine-tuned with DPO on preference datasets are generally suitable for tasks requiring nuanced understanding of human preferences, such as instruction following, dialogue generation, and content moderation, where alignment with desired outputs is critical.

Overview

Model Overview

Key Characteristics

Training Details

Intended Use Cases

Full Model Card (README)