Overview

Starling-LM-7B-alpha is a 7 billion parameter language model developed by Banghua Zhu, Evan Frick, Tianhao Wu, Hanlin Zhu, and Jiantao Jiao. It is fine-tuned from Openchat 3.5, which itself is based on Mistral-7B-v0.1. A key differentiator for Starling-7B is its training methodology, utilizing Reinforcement Learning from AI Feedback (RLAIF) and the novel advantage-induced policy alignment (APA) policy optimization method. This process incorporates the GPT-4 labeled ranking dataset, Nectar.

Key Capabilities & Performance

RLAIF Optimization: Trained with Reinforcement Learning from AI Feedback, enhancing helpfulness and harmlessness.
High MT-Bench Score: Achieves an MT-Bench score of 8.09, placing it among the top-performing open models and outperforming many larger models, as judged by GPT-4.
Policy Optimization: Employs the advantage-induced policy alignment (APA) method for fine-tuning.
Open-source Components: Releases the Nectar ranking dataset and the Starling-RM-7B-alpha reward model alongside the language model.

Usage & Integration

Chat Template: Requires a specific chat template, identical to Openchat 3.5, for optimal performance.
Accessibility: Available for testing on the LMSYS Chatbot Arena.
License: Apache-2.0, with conditions regarding competition with OpenAI.

Overview

Overview

Key Capabilities & Performance

Usage & Integration

Full Model Card (README)