Gen-Searcher-8B: A Deep Research Agent for Image Generation

Gen-Searcher-8B is an 8 billion parameter multimodal deep research agent, representing the first attempt to train such an agent specifically for image generation tasks that demand complex real-world knowledge. Developed by Kaituo Feng and his team, this model is designed to enhance the accuracy and relevance of image synthesis by performing extensive pre-generation research.

Key Capabilities

Agentic Search: Capable of searching the web to gather information and browse evidence.
Multi-Source Reasoning: Reasons over multiple sources of information to inform image generation.
Visual Reference Search: Searches for visual references to guide the synthesis process.
Enhanced Accuracy: Enables more accurate and up-to-date image generation in real-world scenarios.
Benchmark Performance: Achieves significant improvements, with over 15-point gains on the KnowGen and WISE benchmarks.
Transferability: Demonstrates strong transferability across various image generators.

Training and Resources

The model was trained using two dedicated datasets: Gen-Searcher-SFT-10k and Gen-Searcher-RL-6k. A new benchmark, KnowGen, was also introduced for search-grounded image generation. All code, models, data, and benchmarks are fully released and available on the GitHub Repository and detailed in the accompanying paper.

Good For

Applications requiring image generation grounded in real-world, up-to-date information.
Developers looking for a model that can perform deep research before synthesizing images.
Research into agentic AI for multimodal tasks and complex knowledge integration.

Overview

Gen-Searcher-8B: A Deep Research Agent for Image Generation

Key Capabilities

Training and Resources

Good For

Full Model Card (README)