Overview
DR-Tulu-8B: A Deep Research Agent for Tool-Use
DR-Tulu-8B is an 8 billion parameter language model developed by rl-research, specifically designed as an open deep research agent. It is an RL (Reinforcement Learning) checkpoint, having undergone further training on the rl-research/dr-tulu-rl-data dataset, building upon its supervised fine-tuned predecessor, DR-Tulu-SFT-8B.
Key Capabilities & Differentiators
- Specialized Tool-Use: This model is explicitly trained for tool-use using the
dr-agent-libframework, distinguishing it from general-purpose LLMs. This specialization means it is optimized for interacting with external tools and systems to perform research tasks. - Enhanced Research Performance: DR-Tulu-8B shows significant improvements over its SFT base model and other 8B-class models like Qwen3-8B across various research-oriented benchmarks. It achieves an average score of 61.1% on the DeepResearch Bench, outperforming DR-Tulu-SFT-8B (56.0%) and Qwen3-8B (38.6% with a specialized search pipeline).
- Strong Benchmark Results: Notable scores include 86.8% on SQAv2, 50.2% on HealthBench, and 74.3% on ResearchQA, indicating its proficiency in complex question answering and information synthesis.
Intended Use Cases
DR-Tulu-8B is primarily intended for research and educational use, particularly in scenarios requiring advanced information retrieval, complex question answering, and automated research tasks through tool integration. Its design makes it suitable for applications where an agent needs to interact with external knowledge sources or APIs to fulfill deep research queries.