Name: W-61/llama-3-8b-base-new-dpo-ultrafeedback-4xh200-batch-128-q_t-0.45-s_star-0.3-20260428-045924 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: W-61

Model Overview

This model, W-61/llama-3-8b-base-new-dpo-ultrafeedback-4xh200-batch-128-q_t-0.45-s_star-0.3-20260428-045924, is an 8 billion parameter language model. It is a fine-tuned variant of W-61/llama-3-8b-base-sft-ultrachat-8xh200, specifically optimized using Direct Preference Optimization (DPO).

Key Training Details

Base Model: Fine-tuned from W-61/llama-3-8b-base-sft-ultrachat-8xh200.
Fine-tuning Method: Direct Preference Optimization (DPO).
Dataset: Trained on the HuggingFaceH4/ultrafeedback_binarized dataset.
Training Hyperparameters:
- Learning Rate: 5e-07
- Optimizer: AdamW with betas=(0.9, 0.999) and epsilon=1e-08
- Batch Size: 4 (train), 2 (eval) with 8 gradient accumulation steps, totaling 128 (train) and 8 (eval) total batch size.
- Epochs: 1
Evaluation Performance: Achieved a validation loss of 0.6247 on the evaluation set, with specific DPO metrics indicating preference alignment.

Intended Use Cases

This model is primarily intended for applications requiring strong instruction following and human preference alignment, such as:

Conversational AI: Generating coherent and contextually relevant responses in dialogue systems.
Instruction Following: Executing complex instructions and producing desired outputs based on user prompts.
Preference-aligned Generation: Creating content that is preferred by human evaluators, leveraging its DPO fine-tuning.

Overview

Model Overview

Key Training Details

Intended Use Cases

Full Model Card (README)