DR Tulu-No-RLER-8B: An Ablation Model for Deep Research Agents
DR Tulu-No-RLER-8B is an 8 billion parameter model developed by rl-research, serving as an ablation study variant of the main DR Tulu model. It is built on top of rl-research/DR-Tulu-SFT-8B and has undergone Reinforcement Learning (RL) training using a dedicated dataset (rl-research/dr-tulu-rl-data).
Key Characteristics & Purpose
- Ablation Model: This specific version is trained without Reinforcement Learning with Evolving Rubrics (RLER). Its primary purpose is to allow researchers to analyze the effect and contribution of RLER to the overall DR Tulu framework.
- Tool-Use Focused: The model is specifically trained for tool-use within the
dr-agent-lib framework. This means it is designed to interact with external tools and APIs, rather than generating free-form text directly. - Research & Educational Use: Licensed under Apache 2.0, it is intended for academic research and educational purposes, adhering to Ai2's Responsible Use Guidelines.
Usage Considerations
Due to its specialized training for tool-use with dr-agent-lib, direct inference using standard HuggingFace or vLLM setups may not yield optimal results. Users interested in deploying or experimenting with this model should refer to the DR Tulu GitHub repository for detailed installation and usage instructions, which outline how to run the model within its intended framework.
This model provides valuable insights into the components of advanced RL training for deep research agents, particularly concerning the role of RLER.