Name: orbit-ai/infoseeker-repro-4b API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: orbit-ai

InfoSeeker-4B Reproduction with Qwen3-4B

This model, orbit-ai/infoseeker-repro-4b, is a 4-billion parameter open search agent developed by orbit-ai. It is a reproduction of the InfoSeeker model, built upon the Qwen3-4B base and fine-tuned using a single GRPO (Generalized Reinforcement Learning with Policy Optimization) step. The model is specifically designed for multi-turn question answering by integrating live web search as a tool.

Key Capabilities

Retrieval-Augmented Generation (RAG): Utilizes a live DDGS-based retriever to gather information from multiple search backends (Google, Brave, Bing, Wikipedia, Grokipedia).
Multi-turn Reasoning: Capable of engaging in multi-turn dialogues, issuing search queries, processing observations, and formulating answers.
RL-based Tool Use: Trained with 165 GRPO steps using the verl-tool framework, optimizing its ability to interact with external search tools.
Diverse QA Handling: Fine-tuned on a mixed dataset including Natural Questions (single-hop), HotpotQA (multi-hop), and InfoSeek (more difficult, reasoning-intensive multi-hop queries).

Good for

Research into RL-based tool-use training: Ideal for exploring and advancing methodologies for training language models to effectively use external tools.
Multi-turn retrieval-augmented reasoning: Suitable for experiments requiring models to perform complex reasoning over multiple steps, leveraging search results.
Understanding search agent behavior: Provides a platform to analyze how models break down questions, plan solutions, and integrate search observations.

Note: Optimal performance requires a live web search backend. Without it, the model relies solely on its parametric knowledge, which may reduce accuracy on fine-grained questions. For full details, refer to the ORBIT paper here.

Overview

InfoSeeker-4B Reproduction with Qwen3-4B

Key Capabilities

Good for

Full Model Card (README)