PromptEnhancer/PromptEnhancer-32B
PromptEnhancer/PromptEnhancer-32B is a 32 billion parameter multimodal language model, fine-tuned from Qwen/Qwen2.5-VL-32B-Instruct, developed by PromptEnhancer. This model specializes in text-to-image prompt enhancement and rewriting, utilizing chain-of-thought reasoning to restructure user input prompts. It preserves original intent while producing clearer, layered, and logically consistent prompts for improved image generation tasks, supporting both Chinese and English with a 32768 token context length.
Loading preview...
PromptEnhancerV2 (32B) Overview
PromptEnhancerV2 is a specialized 32 billion parameter multimodal language model, fine-tuned from Qwen/Qwen2.5-VL-32B-Instruct, designed for enhancing and rewriting text-to-image prompts. It employs chain-of-thought reasoning to transform user input into clearer, more structured, and logically consistent prompts, optimizing them for downstream image generation tasks while maintaining the original intent.
Key Capabilities
- Prompt Enhancement: Restructures and refines user-provided text-to-image prompts.
- Multilingual Support: Processes and enhances prompts in both Chinese (zh) and English (en).
- Chain-of-Thought Reasoning: Utilizes advanced reasoning to produce layered and coherent prompt outputs.
- Multimodal Integration: Built upon a Vision-Language Model foundation, suitable for text-to-image workflows.
Use Cases
- Improving Image Generation Quality: Generates more effective prompts for better results from text-to-image models.
- Prompt Optimization: Helps users articulate their creative vision more precisely for AI art generation.
- Multilingual Prompting: Facilitates prompt enhancement for a broader user base.
Evaluation
The model's performance is evaluated using the T2I-Keypoints-Eval dataset, which includes a diverse range of text-to-image prompts across various categories and languages.