RLinf/WideSeek-R1-4b

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:32kPublished:Feb 4, 2026License:apache-2.0Architecture:Transformer0.0K Open Weights Warm

RLinf/WideSeek-R1-4b is a 4 billion parameter multi-agent system developed by RLinf, designed for broad information seeking tasks. It utilizes a lead-agent-subagent framework trained via multi-agent reinforcement learning (MARL) to enable scalable orchestration and parallel execution. This model explores width scaling, allowing it to achieve performance comparable to much larger single-agent models on complex information retrieval benchmarks.

Loading preview...

WideSeek-R1-4B: Width Scaling for Broad Information Seeking

WideSeek-R1-4B is a 4 billion parameter model developed by RLinf that introduces a novel approach to tackling broad information-seeking tasks through width scaling in multi-agent systems. Unlike traditional depth scaling that focuses on single-agent reasoning, WideSeek-R1 emphasizes parallel execution and scalable orchestration.

Key Capabilities & Innovations

  • Multi-Agent Reinforcement Learning (MARL): The model is trained using MARL within a lead-agent-subagent framework, optimizing both the lead agent and parallel subagents for synergistic operation.
  • Parallel Execution: It addresses the limitations of existing multi-agent systems by enabling effective parallelization of work, utilizing a shared LLM with isolated contexts and specialized tools.
  • Broad Information Seeking: Specifically designed and trained on a curated dataset of 20,000 broad information-seeking tasks.
  • Performance: WideSeek-R1-4B achieves an item F1 score of 40.0% on the WideSearch benchmark, demonstrating performance comparable to single-agent models like DeepSeek-R1-671B, despite its significantly smaller size.
  • Scalability: Exhibits consistent performance gains as the number of parallel subagents increases, highlighting the effectiveness of its width-scaling approach.

When to Use This Model

WideSeek-R1-4B is particularly well-suited for applications requiring:

  • Complex Information Retrieval: Tasks that benefit from parallel exploration and synthesis of information.
  • Multi-Agent System Development: As a foundation for building and experimenting with multi-agent architectures that prioritize parallel processing over sequential depth.
  • Resource-Efficient Solutions: When seeking performance comparable to much larger models for broad information tasks, but with a smaller parameter footprint.