Model Overview
orbit-ai/orbit-4b-ablation-top-10-docs-v0.1 is a 4 billion parameter language model built upon the Qwen3-4B base. Developed by orbit-ai, this specific version is an ablation model fine-tuned using 165 GRPO (Generative Reinforcement Learning with Policy Optimization) steps. Its core function is to act as an expert open search agent, utilizing web search as a tool for multi-turn question answering.
Key Capabilities & Training
- Tool-Use Specialization: Optimized for web search integration, enabling it to perform multi-turn question answering by issuing
<search> queries and processing <information> observations. - RL-Trained: Fine-tuned with GRPO on a mixed dataset including Natural Questions, HotpotQA, and the ORBIT dataset, with equal sampling across tasks.
- Retrieval-Augmented: Trained with a live DDGS-based retriever, processing top-10 search documents at each turn to enhance reasoning.
- Research Focus: Primarily intended for research into multi-turn retrieval-augmented reasoning and RL-based tool-use training.
Important Considerations
- Ablation Model: This is an ablation model; for general use cases, the developers recommend
orbit-ai/orbit-4b-v0.1. - Search Dependency: Optimal performance requires a live web search backend. Without it, the model relies solely on parametric knowledge, which may reduce accuracy on specific information-seeking tasks.
- English Only: The model is designed for English language tasks.
- Not for Production: Not intended for production deployment without additional safety filtering.