berkeley-nest/Starling-LM-7B-alpha

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Nov 25, 2023License:apache-2.0Architecture:Transformer0.6K Open Weights Warm

Starling-LM-7B-alpha is a 7 billion parameter language model developed by Banghua Zhu, Evan Frick, Tianhao Wu, Hanlin Zhu, and Jiantao Jiao, fine-tuned from Openchat 3.5 (based on Mistral-7B-v0.1). This model is optimized using Reinforcement Learning from AI Feedback (RLAIF) and the advantage-induced policy alignment (APA) method, leveraging the GPT-4 labeled Nectar ranking dataset. It achieves an MT-Bench score of 8.09, outperforming many models in its class and excelling in helpfulness and harmlessness.

Loading preview...

Overview

Starling-LM-7B-alpha is a 7 billion parameter language model developed by Banghua Zhu, Evan Frick, Tianhao Wu, Hanlin Zhu, and Jiantao Jiao. It is fine-tuned from Openchat 3.5, which itself is based on Mistral-7B-v0.1. A key differentiator for Starling-7B is its training methodology, utilizing Reinforcement Learning from AI Feedback (RLAIF) and the novel advantage-induced policy alignment (APA) policy optimization method. This process incorporates the GPT-4 labeled ranking dataset, Nectar.

Key Capabilities & Performance

  • RLAIF Optimization: Trained with Reinforcement Learning from AI Feedback, enhancing helpfulness and harmlessness.
  • High MT-Bench Score: Achieves an MT-Bench score of 8.09, placing it among the top-performing open models and outperforming many larger models, as judged by GPT-4.
  • Policy Optimization: Employs the advantage-induced policy alignment (APA) method for fine-tuning.
  • Open-source Components: Releases the Nectar ranking dataset and the Starling-RM-7B-alpha reward model alongside the language model.

Usage & Integration

  • Chat Template: Requires a specific chat template, identical to Openchat 3.5, for optimal performance.
  • Accessibility: Available for testing on the LMSYS Chatbot Arena.
  • License: Apache-2.0, with conditions regarding competition with OpenAI.