InfoSeeker-4B Reproduction with Qwen3-4B
This model, orbit-ai/infoseeker-repro-4b, is a 4-billion parameter open search agent developed by orbit-ai. It is a reproduction of the InfoSeeker model, built upon the Qwen3-4B base and fine-tuned using a single GRPO (Generalized Reinforcement Learning with Policy Optimization) step. The model is specifically designed for multi-turn question answering by integrating live web search as a tool.
Key Capabilities
- Retrieval-Augmented Generation (RAG): Utilizes a live DDGS-based retriever to gather information from multiple search backends (Google, Brave, Bing, Wikipedia, Grokipedia).
- Multi-turn Reasoning: Capable of engaging in multi-turn dialogues, issuing search queries, processing observations, and formulating answers.
- RL-based Tool Use: Trained with 165 GRPO steps using the
verl-tool framework, optimizing its ability to interact with external search tools. - Diverse QA Handling: Fine-tuned on a mixed dataset including Natural Questions (single-hop), HotpotQA (multi-hop), and InfoSeek (more difficult, reasoning-intensive multi-hop queries).
Good for
- Research into RL-based tool-use training: Ideal for exploring and advancing methodologies for training language models to effectively use external tools.
- Multi-turn retrieval-augmented reasoning: Suitable for experiments requiring models to perform complex reasoning over multiple steps, leveraging search results.
- Understanding search agent behavior: Provides a platform to analyze how models break down questions, plan solutions, and integrate search observations.
Note: Optimal performance requires a live web search backend. Without it, the model relies solely on its parametric knowledge, which may reduce accuracy on fine-grained questions. For full details, refer to the ORBIT paper here.