zai-org/BPO
TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Nov 20, 2023Architecture:Transformer0.0K Cold

zai-org/BPO is a 7 billion parameter prompt optimization model built on Llama-2-7b-chat-hf, designed to align large language models by optimizing user inputs rather than through model training. This black-box alignment technique enhances the helpfulness and harmlessness of responses from various LLMs, including API-based models like GPT-3.5-turbo and Claude-2. It excels at improving existing prompts to achieve better outputs, demonstrating significant win rates against unoptimized models and even PPO/DPO-aligned models in certain scenarios. The model is primarily intended for use as a plug-and-play prompt optimizer to improve the quality of LLM interactions.

Loading preview...

Black-Box Prompt Optimization (BPO)

zai-org/BPO is a 7 billion parameter model built on Llama-2-7b-chat-hf that implements a novel black-box alignment technique for Large Language Models (LLMs). Unlike traditional methods such as PPO or DPO that require model training, BPO optimizes LLMs by refining user inputs (prompts). This allows it to be applied to a wide range of LLMs, including both open-source and API-based models, without needing to modify their internal weights.

Key Capabilities

  • Prompt Optimization: BPO is trained on prompt optimization pairs containing human preference features to generate improved prompts that lead to more helpful and harmless LLM responses.
  • Broad Applicability: It functions as a plug-and-play solution, compatible with various LLMs like GPT-3.5-turbo, Claude-2, and Llama-2-13b-chat, enhancing their output quality.
  • Performance Improvement: Benchmarks show that BPO significantly increases the win rate of optimized prompts over original prompts, and in some cases, outperforms models aligned with PPO or DPO.

Good For

  • Enhancing LLM Responses: Ideal for users looking to improve the quality, helpfulness, and harmlessness of outputs from their chosen LLMs without complex fine-tuning.
  • API-based LLM Alignment: Particularly useful for optimizing interactions with proprietary LLMs where direct model training is not feasible.
  • Research in Alignment Techniques: Offers an alternative perspective to training-based alignment, focusing on input optimization.

Limitations

  • Task Coverage: The current model's task coverage is limited due to being trained on approximately 14k optimized prompts from open-source data, which may affect performance on a wide range of user queries.
  • Specific Task Underperformance: It may underperform on tasks requiring long-context processing or complex mathematical problems due to a smaller representation of such tasks in its training data.