Overview
DR-Tulu-8B: An RL-Trained Deep Research Agent
DR-Tulu-8B is an 8 billion parameter model from rl-research, representing the Reinforcement Learning (RL) checkpoint of the DR Tulu project. It is built on top of the supervised fine-tuned (SFT) model, rl-research/DR-Tulu-SFT-8B, and has been specifically trained for advanced tool-use capabilities.
Key Capabilities & Differentiators
- RL-Trained for Tool-Use: Unlike many general-purpose LLMs, DR-Tulu-8B has undergone specialized RL training using the
dr-agent-libframework, making it highly effective for tasks requiring external tool interaction. - Superior Research Performance: Benchmarks show significant improvements over its SFT base model and other 8B models on research-focused datasets. For instance, it achieves 88.3% on SQAv2, 52.8% on HealthBench, and 45.4% on DeepResearch Bench, outperforming Qwen3-8B and DR-Tulu-SFT-8B.
- Optimized for Deep Research: The model is designed to function as an open deep research agent, leveraging its tool-use proficiency to tackle complex information retrieval and synthesis tasks.
Important Usage Notes
- Requires
dr-agent-lib: Due to its specialized training, DR-Tulu-8B is not intended for out-of-the-box use with standard HuggingFace or vLLM inference. Users must integrate it with thedr-agent-libframework for optimal performance. - Research-Oriented: This model is primarily intended for research and educational use, aligning with Ai2's Responsible Use Guidelines.
For detailed information, including training scripts and hyperparameter specifics, refer to the DR Tulu paper and the GitHub repository.