UCLA-AGI/Llama-3-Instruct-8B-SPPO-Iter2

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:8kPublished:Jun 25, 2024License:apache-2.0Architecture:Transformer Open Weights Warm

UCLA-AGI/Llama-3-Instruct-8B-SPPO-Iter2 is an 8 billion parameter instruction-tuned language model developed by UCLA-AGI. It is based on the Meta-Llama-3-8B-Instruct architecture and fine-tuned using Self-Play Preference Optimization (SPPO) at its second iteration. This model is optimized for improved alignment and performance, demonstrating enhanced win rates on the AlpacaEval Leaderboard compared to its predecessor.

Loading preview...

Model Overview

UCLA-AGI/Llama-3-Instruct-8B-SPPO-Iter2 is an 8 billion parameter instruction-tuned model developed by UCLA-AGI. It is built upon the meta-llama/Meta-Llama-3-8B-Instruct architecture and represents the second iteration of fine-tuning using Self-Play Preference Optimization (SPPO). This method leverages synthetic responses from the openbmb/UltraFeedback dataset for alignment.

Key Capabilities & Performance

  • Self-Play Preference Optimization: Utilizes an iterative self-play approach for alignment, aiming to enhance model performance through preference learning.
  • Improved Alignment: Demonstrates an increased win rate on the AlpacaEval Leaderboard, achieving 35.98% compared to Iter1's 31.74%.
  • General Language Tasks: Shows competitive performance on the Open LLM Leaderboard with an average score of 69.91 across benchmarks like MMLU, Hellaswag, and GSM8k.
  • Synthetic Data Training: Fine-tuned exclusively on synthetic datasets, which can influence its response generation characteristics.

When to Use This Model

This model is suitable for applications requiring a Llama-3-8B-Instruct base with enhanced alignment through SPPO. It can be particularly useful for tasks where improved instruction following and preference-based response generation are critical, especially when comparing performance against earlier SPPO iterations or the base Llama-3-8B-Instruct model.