rl-research/DR-Tulu-SFT-8B

Cold
Public
8B
FP8
32768
License: apache-2.0
Hugging Face
Overview

DR-Tulu-SFT-8B: A Tool-Use Agent for Deep Research

DR-Tulu-SFT-8B is an 8 billion parameter model developed by rl-research, serving as the Supervised Fine-Tuning (SFT) checkpoint of the DR Tulu deep research agent. Built upon the Qwen3-8B architecture, this model is specifically designed and trained for advanced tool-use capabilities using the dr-agent-lib framework.

Key Capabilities & Differentiators

  • Specialized for Tool-Use: Unlike general-purpose LLMs, DR-Tulu-SFT-8B is explicitly trained to integrate and utilize external tools, making it highly effective for complex, multi-step research tasks.
  • Enhanced Research Performance: The model significantly outperforms its base model, Qwen3-8B, across various research-focused benchmarks. For instance, it achieves 72.3 on SQAv2, 38.1 on HealthBench, and 39.0 on DeepResearch Bench, demonstrating superior performance in tasks requiring deep information retrieval and synthesis.
  • SFT Training: It has undergone supervised fine-tuning on a dedicated dataset (rl-research/dr-tulu-sft-data) to optimize its agentic behavior and tool interaction.
  • Open Deep Research Agent: Positioned as an open research agent, it aims to facilitate advanced research applications.

Intended Use Cases

  • Deep Research: Ideal for applications requiring comprehensive information gathering, analysis, and synthesis from various sources.
  • Agentic Systems: Best utilized within the dr-agent-lib framework for building intelligent agents that can interact with tools to solve complex problems.
  • Question Answering: Excels in challenging question-answering scenarios, particularly those requiring external knowledge access and reasoning.

Note: This model is optimized for the dr-agent-lib framework; direct inference with standard HuggingFace or vLLM setups may not yield optimal results. Refer to the DR Tulu GitHub repository for proper usage and integration.