invalid-coder/Starling-LM-7B-beta-laser-dpo
TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Mar 29, 2024License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

Starling-LM-7B-beta-laser-dpo is a 7 billion parameter language model developed by invalid-coder, fine-tuned from Openchat-3.5-0106 (based on Mistral-7B-v0.1). It incorporates a novel training technique called laserRMT, which partially freezes the model to prevent catastrophic forgetting, particularly for specific skills like function calling. This model is optimized for improved helpfulness and harmlessness through Reinforcement Learning from AI Feedback (RLAIF) and achieves an 8.12 MT Bench score.

Loading preview...