Name: haoranxu/Llama-3-Instruct-8B-SimPO API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: haoranxu

Model Overview

haoranxu/Llama-3-Instruct-8B-SimPO is an 8 billion parameter instruction-tuned language model, building upon the robust Meta-Llama-3-8B-Instruct architecture. This model has been specifically fine-tuned using the SimPO (Simple Preference Optimization) method, leveraging the comprehensive princeton-nlp/llama3-ultrafeedback dataset. The fine-tuning process aims to enhance the model's instruction-following capabilities and improve the quality of its generated responses.

Key Training Details

Base Model: Meta-Llama-3-8B-Instruct
Fine-tuning Dataset: princeton-nlp/llama3-ultrafeedback
Training Method: SimPO (Simple Preference Optimization)
Learning Rate: 1e-06
Batch Size: 2 (train), 4 (eval) with 8 gradient accumulation steps, resulting in a total train batch size of 256.
Epochs: 1
Optimizer: Adam with betas=(0.9, 0.999) and epsilon=1e-08
LR Scheduler: Cosine with 0.1 warmup ratio

Intended Use Cases

This model is well-suited for a variety of general-purpose conversational AI applications and tasks requiring precise instruction following. Its fine-tuning on a preference dataset suggests improved alignment with human preferences, making it potentially more effective in generating helpful and harmless outputs.

Overview

Model Overview

Key Training Details

Intended Use Cases

Full Model Card (README)