WideSeek-R1-4B: Width Scaling for Broad Information Seeking
WideSeek-R1-4B is a 4 billion parameter model developed by RLinf that introduces a novel approach to tackling broad information-seeking tasks through width scaling in multi-agent systems. Unlike traditional depth scaling that focuses on single-agent reasoning, WideSeek-R1 emphasizes parallel execution and scalable orchestration.
Key Capabilities & Innovations
- Multi-Agent Reinforcement Learning (MARL): The model is trained using MARL within a lead-agent-subagent framework, optimizing both the lead agent and parallel subagents for synergistic operation.
- Parallel Execution: It addresses the limitations of existing multi-agent systems by enabling effective parallelization of work, utilizing a shared LLM with isolated contexts and specialized tools.
- Broad Information Seeking: Specifically designed and trained on a curated dataset of 20,000 broad information-seeking tasks.
- Performance: WideSeek-R1-4B achieves an item F1 score of 40.0% on the WideSearch benchmark, demonstrating performance comparable to single-agent models like DeepSeek-R1-671B, despite its significantly smaller size.
- Scalability: Exhibits consistent performance gains as the number of parallel subagents increases, highlighting the effectiveness of its width-scaling approach.
When to Use This Model
WideSeek-R1-4B is particularly well-suited for applications requiring:
- Complex Information Retrieval: Tasks that benefit from parallel exploration and synthesis of information.
- Multi-Agent System Development: As a foundation for building and experimenting with multi-agent architectures that prioritize parallel processing over sequential depth.
- Resource-Efficient Solutions: When seeking performance comparable to much larger models for broad information tasks, but with a smaller parameter footprint.