nbeerbower/llama-3-sauce-v2-8B

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:8kLicense:llama3Architecture:Transformer0.0K Warm

nbeerbower/llama-3-sauce-v2-8B is an 8 billion parameter causal language model based on Llama-3-8b, fine-tuned using Direct Preference Optimization (DPO) on various DPO datasets. This model, with an 8192 token context length, is a DPO finetune aiming to improve performance through preference learning. It is designed for general text generation tasks, leveraging its Llama-3 base and DPO training.

Loading preview...

Model Overview

nbeerbower/llama-3-sauce-v2-8B is an 8 billion parameter language model derived from Llama-3-8b. It has been fine-tuned using Direct Preference Optimization (DPO) on a combination of datasets including jondurbin/truthy-dpo-v0.1, jondurbin/gutenberg-dpo-v0.1, and flammenai/FlameMix-DPO-v1. The model utilizes a context length of 8192 tokens.

Training Methodology

The fine-tuning process involved an A100 GPU on Google Colab, employing a LoRA configuration with r=16 and lora_alpha=16. The DPO trainer was configured with a learning rate of 3e-5, max_steps=4000, and beta=0.1. The model was trained using paged_adamw_32bit optimizer and bf16 precision.

Performance Metrics

Evaluations on the Open LLM Leaderboard show an average score of 70.38. Key benchmark results include:

  • AI2 Reasoning Challenge (25-Shot): 65.61
  • HellaSwag (10-Shot): 83.11
  • MMLU (5-Shot): 67.98
  • TruthfulQA (0-shot): 56.39
  • Winogrande (5-shot): 76.72
  • GSM8k (5-shot): 72.48

Usage Notes

Users are advised to use the ChatML format for optimal results. The model's training involved specific ChatML formatting for system, user, and assistant messages.

Popular Sampler Settings

Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.

temperature
top_p
top_k
frequency_penalty
presence_penalty
repetition_penalty
min_p