Nexusflow/Starling-LM-7B-beta
Starling-LM-7B-beta is a 7 billion parameter language model developed by The Nexusflow Team, fine-tuned from Openchat-3.5-0106 (based on Mistral-7B-v0.1) using Reinforcement Learning from AI Feedback (RLAIF). This model leverages a new reward model, Starling-RM-34B, and the Nectar ranking dataset to achieve an improved 8.12 score on MT Bench with GPT-4 as a judge. It is optimized for generating helpful and harmless responses, making it suitable for general conversational AI applications.
Loading preview...
Starling-LM-7B-beta: RLAIF-Tuned for Enhanced Performance
Starling-LM-7B-beta is a 7 billion parameter language model developed by The Nexusflow Team. It is fine-tuned from Openchat-3.5-0106, which itself is based on Mistral-7B-v0.1.
Key Capabilities & Training:
- RLAIF Training: The model is trained using Reinforcement Learning from AI Feedback (RLAIF), a method that utilizes a reward model to optimize for helpfulness and harmlessness.
- Advanced Reward Model: It incorporates a new reward model, Nexusflow/Starling-RM-34B, trained on the berkeley-nest/Nectar ranking dataset.
- Performance: Starling-LM-7B-beta achieves an improved 8.12 score on MT Bench (evaluated by GPT-4), indicating strong conversational abilities.
- Chat Template Adherence: Requires a specific chat template for optimal performance, identical to Openchat-3.5-0106, supporting single-turn, multi-turn, and coding conversations.
When to Use This Model:
- General Conversational AI: Ideal for applications requiring robust and helpful dialogue generation.
- Research in RLAIF: Useful for researchers exploring advanced reinforcement learning techniques for language models.
- Benchmarking: Can serve as a strong baseline for evaluating conversational AI systems, particularly given its MT Bench score.
Note: Users must adhere to the specified chat template for best results. The model is available for testing on LMSYS Chatbot Arena.