Name: rl-research/DR-Tulu-8B-Step-1900 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: rl-research

DR-Tulu-8B-Step-1900: An RL-Tuned Deep Research Agent

DR-Tulu-8B-Step-1900 is an 8 billion parameter language model developed by rl-research, specifically optimized as a deep research agent. This model is an RL-tuned checkpoint of the base DR-Tulu-SFT-8B model, having undergone Reinforcement Learning training on a specialized dataset (rl-research/dr-tulu-rl-data). Its primary differentiator is its advanced tool-use capability, designed to work seamlessly with the dr-agent-lib framework.

Key Capabilities & Performance

Enhanced Tool-Use: Specifically trained for complex tool interaction, making it suitable for agentic workflows.
Superior Research Performance: Demonstrates notable improvements over its SFT base and other 8B models across a suite of research-focused benchmarks.
Benchmark Highlights (Average Score): Achieves an average score of 61.1 on the DeepResearch Bench, outperforming DR-Tulu-SFT-8B (56.0) and Qwen3-8B (38.6 with search pipeline).
Specific Benchmark Gains: Shows significant gains in SQAv2 (86.8%), HealthBench (50.2%), and ResearchQA (74.3%).

Intended Use Cases

Deep Research Applications: Ideal for tasks requiring extensive information retrieval, synthesis, and complex problem-solving.
Agentic Workflows: Best utilized within the dr-agent-lib framework for tool-augmented reasoning.
Academic and Educational Research: Licensed under Apache 2.0 for research and educational purposes, adhering to Ai2's Responsible Use Guidelines.

Note: This model is specifically engineered for tool-use and requires integration with the dr-agent-lib framework for optimal performance. Direct inference with standard HuggingFace or vLLM setups may not yield expected results. For detailed usage and installation, refer to the DR Tulu GitHub repository and the DR Tulu paper.

Overview

DR-Tulu-8B-Step-1900: An RL-Tuned Deep Research Agent

Key Capabilities & Performance

Intended Use Cases

Full Model Card (README)