Name: invalid-coder/Starling-LM-7B-beta-laser-dpo API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: invalid-coder

Model Overview

Starling-LM-7B-beta-laser-dpo is a 7 billion parameter language model developed by the Nexusflow Team, fine-tuned from Openchat-3.5-0106 (which is based on Mistral-7B-v0.1). This model utilizes Reinforcement Learning from AI Feedback (RLAIF) with the Nexusflow/Starling-RM-34B reward model and a policy optimization method based on PPO.

Key Differentiators

Catastrophic Forgetting Prevention: Employs a novel laserRMT-inspired training technique that partially freezes the model. This method is designed to prevent the model from forgetting previously acquired knowledge, which is crucial for teaching specific skills like function calling.
RLAIF Training: Trained using RLAIF with an upgraded reward model and policy tuning pipeline, leveraging the berkeley-nest/Nectar ranking dataset.
Performance: Achieves an MT Bench score of 8.12, as evaluated by GPT-4.

Usage Considerations

Chat Template: Requires the exact chat template as Openchat-3.5-0106 for optimal performance. This includes specific formatting for single-turn, multi-turn, and coding conversations.
Verbosity: Model output can be verbose in rare cases; setting temperature = 0 is suggested to mitigate this.

Good For

Applications requiring a 7B parameter model with enhanced helpfulness and reduced harmlessness.
Scenarios where preventing catastrophic forgetting of specific learned skills (e.g., function calling) is critical.
Developers familiar with the Openchat-3.5-0106 chat template and usage patterns.

Overview

Model Overview

Key Differentiators

Usage Considerations

Good For

Full Model Card (README)