Name: UCLA-AGI/Llama-3-Instruct-8B-SPPO-Iter2 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: UCLA-AGI

Model Overview

UCLA-AGI/Llama-3-Instruct-8B-SPPO-Iter2 is an 8 billion parameter instruction-tuned model developed by UCLA-AGI. It is built upon the meta-llama/Meta-Llama-3-8B-Instruct architecture and represents the second iteration of fine-tuning using Self-Play Preference Optimization (SPPO). This method leverages synthetic responses from the openbmb/UltraFeedback dataset for alignment.

Key Capabilities & Performance

Self-Play Preference Optimization: Utilizes an iterative self-play approach for alignment, aiming to enhance model performance through preference learning.
Improved Alignment: Demonstrates an increased win rate on the AlpacaEval Leaderboard, achieving 35.98% compared to Iter1's 31.74%.
General Language Tasks: Shows competitive performance on the Open LLM Leaderboard with an average score of 69.91 across benchmarks like MMLU, Hellaswag, and GSM8k.
Synthetic Data Training: Fine-tuned exclusively on synthetic datasets, which can influence its response generation characteristics.

When to Use This Model

This model is suitable for applications requiring a Llama-3-8B-Instruct base with enhanced alignment through SPPO. It can be particularly useful for tasks where improved instruction following and preference-based response generation are critical, especially when comparing performance against earlier SPPO iterations or the base Llama-3-8B-Instruct model.

Overview

Model Overview

Key Capabilities & Performance

When to Use This Model

Full Model Card (README)