orbit-ai/infoseeker-repro-4b

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:32kPublished:Mar 9, 2026License:apache-2.0Architecture:Transformer Open Weights Warm

orbit-ai/infoseeker-repro-4b is a 4-billion parameter open search agent based on the Qwen3-4B architecture, fine-tuned by orbit-ai using GRPO for multi-turn question answering with live web search. This model excels at retrieval-augmented reasoning, leveraging a DDGS-based retriever to answer complex queries across datasets like Natural Questions, HotpotQA, and InfoSeek. Its primary use case is research into RL-based tool-use training and multi-turn retrieval-augmented reasoning, requiring an active search backend for optimal performance.

Loading preview...

InfoSeeker-4B Reproduction with Qwen3-4B

This model, orbit-ai/infoseeker-repro-4b, is a 4-billion parameter open search agent developed by orbit-ai. It is a reproduction of the InfoSeeker model, built upon the Qwen3-4B base and fine-tuned using a single GRPO (Generalized Reinforcement Learning with Policy Optimization) step. The model is specifically designed for multi-turn question answering by integrating live web search as a tool.

Key Capabilities

  • Retrieval-Augmented Generation (RAG): Utilizes a live DDGS-based retriever to gather information from multiple search backends (Google, Brave, Bing, Wikipedia, Grokipedia).
  • Multi-turn Reasoning: Capable of engaging in multi-turn dialogues, issuing search queries, processing observations, and formulating answers.
  • RL-based Tool Use: Trained with 165 GRPO steps using the verl-tool framework, optimizing its ability to interact with external search tools.
  • Diverse QA Handling: Fine-tuned on a mixed dataset including Natural Questions (single-hop), HotpotQA (multi-hop), and InfoSeek (more difficult, reasoning-intensive multi-hop queries).

Good for

  • Research into RL-based tool-use training: Ideal for exploring and advancing methodologies for training language models to effectively use external tools.
  • Multi-turn retrieval-augmented reasoning: Suitable for experiments requiring models to perform complex reasoning over multiple steps, leveraging search results.
  • Understanding search agent behavior: Provides a platform to analyze how models break down questions, plan solutions, and integrate search observations.

Note: Optimal performance requires a live web search backend. Without it, the model relies solely on its parametric knowledge, which may reduce accuracy on fine-grained questions. For full details, refer to the ORBIT paper here.