Name: rl-research/DR-Tulu-8B API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: rl-research

DR-Tulu-8B: An RL-Trained Deep Research Agent

DR-Tulu-8B is an 8 billion parameter model from rl-research, representing the Reinforcement Learning (RL) checkpoint of the DR Tulu project. It is built on top of the supervised fine-tuned (SFT) model, rl-research/DR-Tulu-SFT-8B, and has been specifically trained for advanced tool-use capabilities.

Key Capabilities & Differentiators

RL-Trained for Tool-Use: Unlike many general-purpose LLMs, DR-Tulu-8B has undergone specialized RL training using the dr-agent-lib framework, making it highly effective for tasks requiring external tool interaction.
Superior Research Performance: Benchmarks show significant improvements over its SFT base model and other 8B models on research-focused datasets. For instance, it achieves 88.3% on SQAv2, 52.8% on HealthBench, and 45.4% on DeepResearch Bench, outperforming Qwen3-8B and DR-Tulu-SFT-8B.
Optimized for Deep Research: The model is designed to function as an open deep research agent, leveraging its tool-use proficiency to tackle complex information retrieval and synthesis tasks.

Important Usage Notes

Requires dr-agent-lib: Due to its specialized training, DR-Tulu-8B is not intended for out-of-the-box use with standard HuggingFace or vLLM inference. Users must integrate it with the dr-agent-lib framework for optimal performance.
Research-Oriented: This model is primarily intended for research and educational use, aligning with Ai2's Responsible Use Guidelines.

For detailed information, including training scripts and hyperparameter specifics, refer to the DR Tulu paper and the GitHub repository.

Overview

DR-Tulu-8B: An RL-Trained Deep Research Agent

Key Capabilities & Differentiators

Important Usage Notes

Full Model Card (README)