GenSearcher/Gen-Searcher-8B
Gen-Searcher/Gen-Searcher-8B is an 8 billion parameter multimodal deep research agent developed by Kaituo Feng and his team, designed for image generation requiring complex real-world knowledge. This model integrates web search, evidence browsing, multi-source reasoning, and visual reference searching before image synthesis. It achieves significant performance gains, including over 15 points on the KnowGen and WISE benchmarks, making it suitable for accurate and up-to-date image generation.
Loading preview...
Gen-Searcher-8B: A Deep Research Agent for Image Generation
Gen-Searcher-8B is an 8 billion parameter multimodal deep research agent, representing the first attempt to train such an agent specifically for image generation tasks that demand complex real-world knowledge. Developed by Kaituo Feng and his team, this model is designed to enhance the accuracy and relevance of image synthesis by performing extensive pre-generation research.
Key Capabilities
- Agentic Search: Capable of searching the web to gather information and browse evidence.
- Multi-Source Reasoning: Reasons over multiple sources of information to inform image generation.
- Visual Reference Search: Searches for visual references to guide the synthesis process.
- Enhanced Accuracy: Enables more accurate and up-to-date image generation in real-world scenarios.
- Benchmark Performance: Achieves significant improvements, with over 15-point gains on the KnowGen and WISE benchmarks.
- Transferability: Demonstrates strong transferability across various image generators.
Training and Resources
The model was trained using two dedicated datasets: Gen-Searcher-SFT-10k and Gen-Searcher-RL-6k. A new benchmark, KnowGen, was also introduced for search-grounded image generation. All code, models, data, and benchmarks are fully released and available on the GitHub Repository and detailed in the accompanying paper.
Good For
- Applications requiring image generation grounded in real-world, up-to-date information.
- Developers looking for a model that can perform deep research before synthesizing images.
- Research into agentic AI for multimodal tasks and complex knowledge integration.