berkeley-nest/Starling-LM-7B-alpha
Starling-LM-7B-alpha is a 7 billion parameter language model developed by Banghua Zhu, Evan Frick, Tianhao Wu, Hanlin Zhu, and Jiantao Jiao, fine-tuned from Openchat 3.5 (based on Mistral-7B-v0.1). This model is optimized using Reinforcement Learning from AI Feedback (RLAIF) and the advantage-induced policy alignment (APA) method, leveraging the GPT-4 labeled Nectar ranking dataset. It achieves an MT-Bench score of 8.09, outperforming many models in its class and excelling in helpfulness and harmlessness.
Loading preview...
Overview
Starling-LM-7B-alpha is a 7 billion parameter language model developed by Banghua Zhu, Evan Frick, Tianhao Wu, Hanlin Zhu, and Jiantao Jiao. It is fine-tuned from Openchat 3.5, which itself is based on Mistral-7B-v0.1. A key differentiator for Starling-7B is its training methodology, utilizing Reinforcement Learning from AI Feedback (RLAIF) and the novel advantage-induced policy alignment (APA) policy optimization method. This process incorporates the GPT-4 labeled ranking dataset, Nectar.
Key Capabilities & Performance
- RLAIF Optimization: Trained with Reinforcement Learning from AI Feedback, enhancing helpfulness and harmlessness.
- High MT-Bench Score: Achieves an MT-Bench score of 8.09, placing it among the top-performing open models and outperforming many larger models, as judged by GPT-4.
- Policy Optimization: Employs the advantage-induced policy alignment (APA) method for fine-tuning.
- Open-source Components: Releases the Nectar ranking dataset and the Starling-RM-7B-alpha reward model alongside the language model.
Usage & Integration
- Chat Template: Requires a specific chat template, identical to Openchat 3.5, for optimal performance.
- Accessibility: Available for testing on the LMSYS Chatbot Arena.
- License: Apache-2.0, with conditions regarding competition with OpenAI.