inclusionAI/DR-Venus-4B-RL

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:32kPublished:Apr 21, 2026Architecture:Transformer0.0K Warm

DR-Venus-4B-RL by inclusionAI is a 4 billion parameter reinforcement-learned deep research agent, built on Qwen3-4B-Thinking-2507. It is specifically designed for long-horizon web research, featuring explicit tool use with search and visit functions, evidence collection, and answer generation. This model excels at improving execution reliability in multi-step retrieval and browsing tasks, leveraging a maximum context length of 256K tokens.

Loading preview...

Overview

DR-Venus-4B-RL is a 4 billion parameter deep research agent developed by inclusionAI, built upon the Qwen3-4B-Thinking-2507 base model and fine-tuned from the DR-Venus-4B-SFT checkpoint. This model is uniquely trained using agentic reinforcement learning with IGPO-style information gain rewards and format-aware turn-level supervision. This approach significantly enhances execution reliability over long tool-use trajectories, making it highly effective for complex web research tasks.

Key Capabilities

  • Long-Horizon Deep Research: Designed for multi-step web research, utilizing search and visit tools.
  • Enhanced Execution Reliability: Improves performance beyond supervised imitation through agentic RL.
  • Evidence-Grounded Answering: Focuses on collecting and using evidence to generate answers.
  • High Context Length: Supports a maximum rollout context length of 256K interaction steps.
  • Benchmark Improvements: Shows notable gains over its SFT checkpoint and other models under 9B on deep research benchmarks like BrowseComp (+2.3), xBench-DS-2505 (+5.7), and DeepSearchQA (+1.9).

Good For

  • Applications requiring long-horizon deep research with tool-augmented reasoning.
  • Scenarios where improving execution reliability in multi-step processes is critical.
  • Use cases demanding evidence-grounded answering through web browsing and retrieval.
  • Deployment within the official DR-Venuss inference pipeline.