CCCCCC/VPO-5B
VPO-5B is an 8 billion parameter model developed by CCCCCC, designed for optimizing prompts for text-to-video generation models. It employs a two-stage process involving supervised fine-tuning guided by safety and alignment, followed by preference learning with text-level and video-level feedback. This model specializes in expanding short user queries into detailed, harmless, aligned, and high-quality video generation prompts, specifically trained to optimize for CogVideoX-5B.
Loading preview...
VPO-5B: Prompt Optimization for Text-to-Video Generation
VPO-5B is an 8 billion parameter model developed by CCCCCC, specifically engineered to optimize user prompts for text-to-video generation, particularly for models like CogVideoX-5B. This model utilizes a principled prompt optimization framework focused on harmlessness, accuracy, and helpfulness.
Key Capabilities
- Two-Stage Optimization: Employs supervised fine-tuning to construct a dataset guided by safety and alignment, followed by preference learning using both text-level and video-level feedback.
- Prompt Expansion: Transforms concise user queries into detailed, well-structured English prompts for video generation.
- Safety and Alignment: Ensures generated prompts are safe, respectful, free from harmful content, and fully preserve the user's original intent.
- High-Quality Video Generation: Formulates descriptive and vivid prompts to facilitate the creation of high-quality videos, ensuring feasibility and suitability for short durations.
Good For
- Enhancing the quality and safety of inputs for text-to-video generation systems.
- Automating the creation of detailed and aligned video prompts from simple user requests.
- Developers working with CogVideoX-5B who need optimized and safe prompt inputs.