Model Overview
orbit-ai/orbit-4b-ablation-training-mix-124-v0.1 is a 4 billion parameter model based on the Qwen3-4B architecture, developed by orbit-ai. This specific checkpoint is an ablation model from the ORBIT project, focusing on the impact of data mixing ratios (1:2:4 of NQ:HotpotQA:ORBIT datasets) during training. It is fine-tuned using GRPO (Generalized Reinforcement Learning with Policy Optimization) to function as an expert open search agent, capable of using web search tools for multi-turn question answering.
Key Capabilities
- Tool-use for Web Search: Designed to integrate with a live DDGS-based retriever, enabling it to perform web searches to answer complex, multi-turn questions.
- Multi-hop Reasoning: Trained on datasets like HotpotQA and ORBIT, which emphasize multi-hop and difficult reasoning queries.
- RL-trained Agent: Utilizes a reinforcement learning approach (GRPO) for training, optimizing its ability to interact with external tools.
- Ablation Study: Represents a specific training configuration for research into data mixing strategies for search agents.
Good For
- Research into RL-based Tool-use: Ideal for researchers exploring reinforcement learning techniques for training language models to use external tools.
- Multi-turn Retrieval-Augmented Reasoning: Suitable for investigating how models can effectively perform multi-turn question answering by augmenting their knowledge with real-time web search.
- Understanding Data Mixing Impact: Useful for studying the effects of different dataset ratios on the performance of search agents. Users seeking a general-purpose model are advised to use orbit-ai/orbit-4b-v0.1.