CCCCCC/VPO-5B

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:8kLicense:apache-2.0Architecture:Transformer0.0K Open Weights Cold

VPO-5B is an 8 billion parameter model developed by CCCCCC, designed for optimizing prompts for text-to-video generation models. It employs a two-stage process involving supervised fine-tuning guided by safety and alignment, followed by preference learning with text-level and video-level feedback. This model specializes in expanding short user queries into detailed, harmless, aligned, and high-quality video generation prompts, specifically trained to optimize for CogVideoX-5B.

Loading preview...

VPO-5B: Prompt Optimization for Text-to-Video Generation

VPO-5B is an 8 billion parameter model developed by CCCCCC, specifically engineered to optimize user prompts for text-to-video generation, particularly for models like CogVideoX-5B. This model utilizes a principled prompt optimization framework focused on harmlessness, accuracy, and helpfulness.

Key Capabilities

  • Two-Stage Optimization: Employs supervised fine-tuning to construct a dataset guided by safety and alignment, followed by preference learning using both text-level and video-level feedback.
  • Prompt Expansion: Transforms concise user queries into detailed, well-structured English prompts for video generation.
  • Safety and Alignment: Ensures generated prompts are safe, respectful, free from harmful content, and fully preserve the user's original intent.
  • High-Quality Video Generation: Formulates descriptive and vivid prompts to facilitate the creation of high-quality videos, ensuring feasibility and suitability for short durations.

Good For

  • Enhancing the quality and safety of inputs for text-to-video generation systems.
  • Automating the creation of detailed and aligned video prompts from simple user requests.
  • Developers working with CogVideoX-5B who need optimized and safe prompt inputs.