Overview
DR-Tulu-SFT-8B: A Tool-Use Agent for Deep Research
DR-Tulu-SFT-8B is an 8 billion parameter model developed by rl-research, serving as the Supervised Fine-Tuning (SFT) checkpoint of the DR Tulu deep research agent. Built upon the Qwen3-8B architecture, this model is specifically designed and trained for advanced tool-use capabilities using the dr-agent-lib framework.
Key Capabilities & Differentiators
- Specialized for Tool-Use: Unlike general-purpose LLMs, DR-Tulu-SFT-8B is explicitly trained to integrate and utilize external tools, making it highly effective for complex, multi-step research tasks.
- Enhanced Research Performance: The model significantly outperforms its base model, Qwen3-8B, across various research-focused benchmarks. For instance, it achieves 72.3 on SQAv2, 38.1 on HealthBench, and 39.0 on DeepResearch Bench, demonstrating superior performance in tasks requiring deep information retrieval and synthesis.
- SFT Training: It has undergone supervised fine-tuning on a dedicated dataset (
rl-research/dr-tulu-sft-data) to optimize its agentic behavior and tool interaction. - Open Deep Research Agent: Positioned as an open research agent, it aims to facilitate advanced research applications.
Intended Use Cases
- Deep Research: Ideal for applications requiring comprehensive information gathering, analysis, and synthesis from various sources.
- Agentic Systems: Best utilized within the
dr-agent-libframework for building intelligent agents that can interact with tools to solve complex problems. - Question Answering: Excels in challenging question-answering scenarios, particularly those requiring external knowledge access and reasoning.
Note: This model is optimized for the dr-agent-lib framework; direct inference with standard HuggingFace or vLLM setups may not yield optimal results. Refer to the DR Tulu GitHub repository for proper usage and integration.