rl-research/DR-Tulu-8B-Step-1900
TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Mar 26, 2026License:apache-2.0Architecture:Transformer Open Weights Cold

DR-Tulu-8B-Step-1900 is an 8 billion parameter deep research agent developed by rl-research, fine-tuned with Reinforcement Learning (RL) for advanced tool-use capabilities. Building upon the DR-Tulu-SFT-8B base, this model excels in complex research-oriented tasks, demonstrating significant performance improvements across various benchmarks including SQAv2, HealthBench, and ResearchQA. It is specifically designed for integration with the dr-agent-lib framework, making it suitable for applications requiring sophisticated tool interaction and information retrieval.

Loading preview...

DR-Tulu-8B-Step-1900: An RL-Tuned Deep Research Agent

DR-Tulu-8B-Step-1900 is an 8 billion parameter language model developed by rl-research, specifically optimized as a deep research agent. This model is an RL-tuned checkpoint of the base DR-Tulu-SFT-8B model, having undergone Reinforcement Learning training on a specialized dataset (rl-research/dr-tulu-rl-data). Its primary differentiator is its advanced tool-use capability, designed to work seamlessly with the dr-agent-lib framework.

Key Capabilities & Performance

  • Enhanced Tool-Use: Specifically trained for complex tool interaction, making it suitable for agentic workflows.
  • Superior Research Performance: Demonstrates notable improvements over its SFT base and other 8B models across a suite of research-focused benchmarks.
  • Benchmark Highlights (Average Score): Achieves an average score of 61.1 on the DeepResearch Bench, outperforming DR-Tulu-SFT-8B (56.0) and Qwen3-8B (38.6 with search pipeline).
  • Specific Benchmark Gains: Shows significant gains in SQAv2 (86.8%), HealthBench (50.2%), and ResearchQA (74.3%).

Intended Use Cases

  • Deep Research Applications: Ideal for tasks requiring extensive information retrieval, synthesis, and complex problem-solving.
  • Agentic Workflows: Best utilized within the dr-agent-lib framework for tool-augmented reasoning.
  • Academic and Educational Research: Licensed under Apache 2.0 for research and educational purposes, adhering to Ai2's Responsible Use Guidelines.

Note: This model is specifically engineered for tool-use and requires integration with the dr-agent-lib framework for optimal performance. Direct inference with standard HuggingFace or vLLM setups may not yield expected results. For detailed usage and installation, refer to the DR Tulu GitHub repository and the DR Tulu paper.