GenSearcher/Gen-Searcher-SFT-8B

VISIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Mar 25, 2026License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

Gen-Searcher/Gen-Searcher-SFT-8B is an 8 billion parameter supervised fine-tuned model developed by Feng et al. for agentic search in image generation. This model is designed to search the web, browse evidence, reason over multiple sources, and find visual references before synthesizing images. It serves as an intermediate step for subsequent reinforcement learning training using the GRPO algorithm, excelling in tasks requiring complex real-world knowledge for image synthesis.

Loading preview...

Overview

Gen-Searcher-SFT-8B is an 8 billion parameter supervised fine-tuned (SFT) model, developed as part of the Gen-Searcher project by Feng et al. This model is specifically designed to function as a multimodal deep research agent for image generation, particularly for scenarios requiring complex real-world knowledge. It represents an intermediate stage, intended for further reinforcement learning (RL) training using the GRPO algorithm with dual reward feedback.

Key Capabilities

  • Agentic Search: Capable of searching the web, browsing evidence, and reasoning over multiple sources.
  • Visual Reference Search: Integrates the ability to search for visual references to inform image generation.
  • Enhanced Accuracy: Aims to enable more accurate and up-to-date image synthesis by grounding generation in real-world information.
  • Performance Gains: Achieves significant improvements, with over 15-point gains on the KnowGen and WISE benchmarks, demonstrating strong transferability to various image generators.
  • Dataset Development: Utilizes dedicated training datasets, Gen-Searcher-SFT-10k and Gen-Searcher-RL-6k, and introduces a new benchmark, KnowGen, for search-grounded image generation.

Good For

  • Research in Agentic AI: Ideal for researchers exploring deep research agents and multimodal reasoning.
  • Image Generation with External Knowledge: Suitable for applications requiring image synthesis grounded in up-to-date, real-world information.
  • Developing Search-Augmented LLMs: Provides a foundation for models that integrate web search and reasoning into creative tasks.
  • Further RL Training: Serves as a robust base model for subsequent reinforcement learning experiments in agentic image generation.