jieliu/Storm-7B

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Apr 25, 2024License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

Storm-7B is a 7 billion parameter language model developed by Jie Liu, Zhanhui Zhou, Jiaheng Liu, Xingyuan Bu, Chao Yang, Han-Sen Zhong, and Wanli Ouyang, fine-tuned from openchat-3.5-0106. It utilizes iterative length-regularized Direct Preference Optimization (iLR-DPO) to achieve performance comparable to GPT-4 on AlpacaEval 2.0, specifically optimized to improve response quality without increasing verbosity. This model is designed for high-quality, concise conversational AI applications, demonstrating a 50.5% length-controlled win rate against GPT-4 Preview.

Loading preview...

Storm-7B: GPT-4 Level Performance in a 7B Model

Storm-7B is an open-source 7 billion parameter language model developed by Jie Liu and collaborators, fine-tuned from openchat-3.5-0106. It introduces iterative length-regularized Direct Preference Optimization (iLR-DPO), a novel training approach that addresses the common pitfall of increased verbosity in iterative DPO methods. By penalizing response length during training, iLR-DPO enhances response quality and alignment with human values without making the model more verbose.

Key Capabilities & Performance

  • GPT-4 Level Performance: Achieves a 50.5% length-controlled win rate against GPT-4 Preview on the AlpacaEval 2.0 leaderboard, making it the first open-source model to surpass GPT-4 Preview in this metric.
  • Verbosity Control: iLR-DPO ensures that improvements in response quality do not lead to increased response length, maintaining conciseness.
  • Enhanced Decoding: With beam search, the model shows a 5% improvement over regular decoding. Best-of-n sampling with the Starling-RM-34B reward model achieves a 61.6% LC Win rate, outperforming GPT-4 Omni.
  • Maintained NLP Performance: The model shows no significant degradation on traditional NLP tasks, as indicated by the Huggingface Open LLM Leaderboard.

Use Cases & Limitations

Storm-7B is well-suited for conversational AI applications requiring high-quality, concise responses. It uses the same chat template as Openchat-3.5-0106.

Limitations include reliance on GPT-4 as a proxy for human judgment in alignment and the use of a length penalty rather than a direct verbosity reward model. Future work may explore training a specific reward model for verbosity.