invalid-coder/Starling-LM-7B-beta-laser-dpo
Starling-LM-7B-beta-laser-dpo is a 7 billion parameter language model developed by invalid-coder, fine-tuned from Openchat-3.5-0106 (based on Mistral-7B-v0.1). It incorporates a novel training technique called laserRMT, which partially freezes the model to prevent catastrophic forgetting, particularly for specific skills like function calling. This model is optimized for improved helpfulness and harmlessness through Reinforcement Learning from AI Feedback (RLAIF) and achieves an 8.12 MT Bench score.
Loading preview...
Model Overview
Starling-LM-7B-beta-laser-dpo is a 7 billion parameter language model developed by the Nexusflow Team, fine-tuned from Openchat-3.5-0106 (which is based on Mistral-7B-v0.1). This model utilizes Reinforcement Learning from AI Feedback (RLAIF) with the Nexusflow/Starling-RM-34B reward model and a policy optimization method based on PPO.
Key Differentiators
- Catastrophic Forgetting Prevention: Employs a novel laserRMT-inspired training technique that partially freezes the model. This method is designed to prevent the model from forgetting previously acquired knowledge, which is crucial for teaching specific skills like function calling.
- RLAIF Training: Trained using RLAIF with an upgraded reward model and policy tuning pipeline, leveraging the berkeley-nest/Nectar ranking dataset.
- Performance: Achieves an MT Bench score of 8.12, as evaluated by GPT-4.
Usage Considerations
- Chat Template: Requires the exact chat template as Openchat-3.5-0106 for optimal performance. This includes specific formatting for single-turn, multi-turn, and coding conversations.
- Verbosity: Model output can be verbose in rare cases; setting
temperature = 0is suggested to mitigate this.
Good For
- Applications requiring a 7B parameter model with enhanced helpfulness and reduced harmlessness.
- Scenarios where preventing catastrophic forgetting of specific learned skills (e.g., function calling) is critical.
- Developers familiar with the Openchat-3.5-0106 chat template and usage patterns.