haoranxu/Llama-3-Instruct-8B-SimPO

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:8kPublished:Jun 7, 2024License:llama3Architecture:Transformer0.0K Warm

The haoranxu/Llama-3-Instruct-8B-SimPO model is an 8 billion parameter instruction-tuned language model, fine-tuned from Meta-Llama-3-8B-Instruct. It leverages the SimPO training method on the princeton-nlp/llama3-ultrafeedback dataset, enhancing its ability to follow instructions and generate high-quality responses. This model is designed for general-purpose conversational AI and instruction-following tasks, offering an 8192 token context window.

Loading preview...

Model Overview

haoranxu/Llama-3-Instruct-8B-SimPO is an 8 billion parameter instruction-tuned language model, building upon the robust Meta-Llama-3-8B-Instruct architecture. This model has been specifically fine-tuned using the SimPO (Simple Preference Optimization) method, leveraging the comprehensive princeton-nlp/llama3-ultrafeedback dataset. The fine-tuning process aims to enhance the model's instruction-following capabilities and improve the quality of its generated responses.

Key Training Details

  • Base Model: Meta-Llama-3-8B-Instruct
  • Fine-tuning Dataset: princeton-nlp/llama3-ultrafeedback
  • Training Method: SimPO (Simple Preference Optimization)
  • Learning Rate: 1e-06
  • Batch Size: 2 (train), 4 (eval) with 8 gradient accumulation steps, resulting in a total train batch size of 256.
  • Epochs: 1
  • Optimizer: Adam with betas=(0.9, 0.999) and epsilon=1e-08
  • LR Scheduler: Cosine with 0.1 warmup ratio

Intended Use Cases

This model is well-suited for a variety of general-purpose conversational AI applications and tasks requiring precise instruction following. Its fine-tuning on a preference dataset suggests improved alignment with human preferences, making it potentially more effective in generating helpful and harmless outputs.