inclusionAI/DR-Venus-4B-RL
DR-Venus-4B-RL by inclusionAI is a 4 billion parameter reinforcement-learned deep research agent, built on Qwen3-4B-Thinking-2507. It is specifically designed for long-horizon web research, featuring explicit tool use with search and visit functions, evidence collection, and answer generation. This model excels at improving execution reliability in multi-step retrieval and browsing tasks, leveraging a maximum context length of 256K tokens.
Loading preview...
Overview
DR-Venus-4B-RL is a 4 billion parameter deep research agent developed by inclusionAI, built upon the Qwen3-4B-Thinking-2507 base model and fine-tuned from the DR-Venus-4B-SFT checkpoint. This model is uniquely trained using agentic reinforcement learning with IGPO-style information gain rewards and format-aware turn-level supervision. This approach significantly enhances execution reliability over long tool-use trajectories, making it highly effective for complex web research tasks.
Key Capabilities
- Long-Horizon Deep Research: Designed for multi-step web research, utilizing
searchandvisittools. - Enhanced Execution Reliability: Improves performance beyond supervised imitation through agentic RL.
- Evidence-Grounded Answering: Focuses on collecting and using evidence to generate answers.
- High Context Length: Supports a maximum rollout context length of
256Kinteraction steps. - Benchmark Improvements: Shows notable gains over its SFT checkpoint and other models under 9B on deep research benchmarks like BrowseComp (+2.3), xBench-DS-2505 (+5.7), and DeepSearchQA (+1.9).
Good For
- Applications requiring long-horizon deep research with tool-augmented reasoning.
- Scenarios where improving execution reliability in multi-step processes is critical.
- Use cases demanding evidence-grounded answering through web browsing and retrieval.
- Deployment within the official DR-Venuss inference pipeline.