GAIR/DeepResearcher-7b

TEXT GENERATIONConcurrency Cost:1Model Size:7.6BQuant:FP8Ctx Length:32kPublished:Apr 3, 2025License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

GAIR/DeepResearcher-7b is a 7.6 billion parameter large language model built on the Qwen2.5-7B-Instruct architecture, specifically designed for deep research tasks. It is the first comprehensive framework to train LLM-based deep research agents using reinforcement learning (RL) in real-world web search environments. This model excels at formulating plans, cross-validating information, self-reflecting to redirect research, and maintaining honesty when definitive answers are unavailable. DeepResearcher-7b is optimized for end-to-end research agent capabilities, outperforming baselines in both in-domain and out-of-domain question-answering scenarios.

Loading preview...

DeepResearcher-7b: An RL-Trained Deep Research Agent

GAIR/DeepResearcher-7b is a 7.6 billion parameter large language model, fine-tuned from the Qwen2.5-7B-Instruct architecture. It represents a novel approach to creating LLM-based research agents through end-to-end reinforcement learning (RL) in real-world web search environments. The model leverages authentic web interactions to develop advanced research capabilities.

Key Capabilities & Features

  • Emergent Cognitive Behaviors: Through RL training, DeepResearcher-7b exhibits advanced behaviors such as formulating research plans, cross-validating information from multiple sources, and self-reflection to adapt its research strategy.
  • Honesty & Transparency: The model is designed to acknowledge when it cannot find definitive answers, promoting reliable information retrieval.
  • Reinforcement Learning (RL) Training: Utilizes the Group Relative Policy Optimization (GRPO) algorithm, trained on open-domain question-answering datasets including NaturalQuestions, TriviaQA, HotpotQA, and 2Wiki MultiHopQA.
  • Robust Performance: Demonstrates significant improvements over baseline models in task completion, particularly in challenging out-of-domain scenarios like Musique, Bamboogle, and PopQA.

Use Cases & Differentiators

DeepResearcher-7b is ideal for applications requiring autonomous, in-depth information gathering and synthesis from web sources. Its primary differentiator is its RL-driven training in real-world environments, which fosters more human-like research strategies and adaptability compared to models trained solely on static datasets. This makes it particularly suitable for complex question-answering, investigative tasks, and scenarios where dynamic information validation is crucial.