Agent-STAR-RL-3B: Long-Horizon Tool Orchestration

Agent-STAR-RL-3B is a 3.1 billion parameter Large Language Model (LLM) specifically fine-tuned for long-horizon tool orchestration tasks. Developed by Xixi Wu et al. and introduced in the paper "Demystifying Reinforcement Learning for Long-Horizon Tool-Using Agents: A Comprehensive Recipe", this model is built upon the Qwen2.5-3B-Instruct backbone.

Key Capabilities

Tool Orchestration: Designed to handle complex, multi-turn agentic environments where the model must effectively call various tools.
Constraint Satisfaction: Optimized to satisfy multifaceted constraints within these environments.
Reinforcement Learning (RL) Tuned: Utilizes a unified post-training pipeline (Data Synthesis → SFT → RL) with staged rewards and enhanced exploration during the RL phase, which is particularly beneficial for smaller models like this 3B variant.
Benchmark Performance: Optimized for benchmarks such as TravelPlanner.

Good For

Developing agents that require sequential tool use over extended interactions.
Research into reinforcement learning techniques for LLMs in agentic settings.
Applications demanding efficient and constrained tool calling in complex scenarios.

For detailed inference instructions and the ReAct-based inference pipeline, refer to the official GitHub repository.

Overview

Agent-STAR-RL-3B: Long-Horizon Tool Orchestration

Key Capabilities

Good For

Full Model Card (README)