GenEvolve: Self-Evolving Image Generation Agent

GenEvolve is an 8 billion parameter agent policy, built upon the Qwen3-VL-8B-Instruct backbone, specifically engineered for advanced image generation. Unlike traditional LLMs that directly generate images or text, GenEvolve acts as a sophisticated orchestrator, producing a (gen_prompt, reference_images) program that drives any reference-conditioned downstream image generator. This unique approach allows it to leverage external tools and internal knowledge for highly nuanced visual outputs.

Key Capabilities

Tool-Orchestrated Trajectories: The agent intelligently calls tools such as search, image_search, and query_knowledge (8 distinct generation skills) to gather information before formulating the final image generation program.
Self-Evolution with Visual Experience Distillation: GenEvolve continuously improves through a self-evolution mechanism that distills best-vs-worst trajectory pairs into the deployed student policy, enhancing performance without requiring runtime memory at inference.
Generator-Transferable: The same trained GenEvolve policy demonstrates robust performance across different image generators, including open-source options like Qwen-Image-Edit and proprietary models like Nano Banana Pro, showcasing its adaptability.
Enhanced Knowledge Anchoring: Benchmarks like GenEvolve-Bench and WISE demonstrate GenEvolve's superior ability to anchor generated images to specific knowledge and maintain high quality compared to raw generators and other search-based agents.

Good For

Research in Agentic Image Generation: Ideal for exploring tool-using image-generation agents, agentic prompt-program synthesis, and self-distillation techniques.
Complex Visual Request Fulfillment: Suited for scenarios requiring detailed, knowledge-anchored image generation by leveraging external information and structured prompts.
Driving Diverse Image Generators: Can be used as a front-end orchestrator for various reference-conditioned image generation models, providing consistent and high-quality inputs.

Overview

GenEvolve: Self-Evolving Image Generation Agent

Key Capabilities

Good For

Full Model Card (README)