Name: UCLA-AGI/Llama-3-Instruct-8B-SPPO-Iter3 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: UCLA-AGI

Overview

UCLA-AGI/Llama-3-Instruct-8B-SPPO-Iter3 is an 8 billion parameter instruction-tuned model developed by UCLA-AGI. It is built upon the meta-llama/Meta-Llama-3-8B-Instruct architecture and has undergone three iterations of fine-tuning using Self-Play Preference Optimization (SPPO). The training utilized synthetic responses generated from the openbmb/UltraFeedback dataset, specifically split into three parts for iterative refinement.

Key Capabilities & Performance

This model shows progressive improvements across its iterations. On the AlpacaEval Leaderboard, Iter3 achieves a 38.77% LC. Win Rate and a 39.85% Win Rate, outperforming Iter1 and Iter2. Similarly, on the Open LLM Leaderboard, Iter3 records an average score of 70.29%, with notable scores in arc_challenge (65.19%) and hellaswag (80.86%). The model is primarily English-language focused and is licensed under Apache-2.0.

Training Methodology

The SPPO method aims to enhance language model alignment. The training involved specific hyperparameters including a learning rate of 5e-07, RMSProp optimizer, and a linear learning rate scheduler with a warmup ratio of 0.1. The iterative training process, as detailed in the associated research paper, leverages self-play to refine model preferences.

Good For

General instruction-following tasks
Applications requiring improved alignment compared to base Llama-3-8B-Instruct
Research into Self-Play Preference Optimization techniques

Overview

Overview

Key Capabilities & Performance

Training Methodology

Good For

Full Model Card (README)