Name: jackf857/llama-3-8b-base-margin-dpo-hh-helpful-batch-64 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: jackf857

Model Overview

This model, jackf857/llama-3-8b-base-margin-dpo-hh-helpful-batch-64, is an 8 billion parameter Llama 3 base model. It has been fine-tuned using the Margin DPO (Direct Preference Optimization) method, building upon a prior Supervised Fine-Tuning (SFT) of a Llama 3 base model. The training utilized the Anthropic/hh-rlhf dataset, which is known for aligning models with human preferences for helpfulness.

Key Training Details

Base Model: W-61/llama-3-8b-base-sft-hh-helpful-4xh200
Fine-tuning Method: Margin DPO
Dataset: Anthropic/hh-rlhf
Training Hyperparameters:
- Learning Rate: 5e-07
- Total Train Batch Size: 64
- Number of Epochs: 1
Evaluation Metrics: Achieved a final loss of 0.4046, with a Margin Dpo/loss Margin Mean of 21.7563, indicating effective preference learning.

Intended Use Cases

This model is particularly suited for applications where generating helpful and aligned text is crucial. Its fine-tuning on the Anthropic/hh-rlhf dataset suggests a strong emphasis on producing responses that are considered helpful and harmless by human evaluators. Developers can leverage this model for tasks requiring robust and preference-aligned language generation.

Overview

Model Overview

Key Training Details

Intended Use Cases

Full Model Card (README)