Agent-STAR-RL-3B: Long-Horizon Tool Orchestration
Agent-STAR-RL-3B is a 3.1 billion parameter Large Language Model (LLM) specifically fine-tuned for long-horizon tool orchestration tasks. Developed by Xixi Wu et al. and introduced in the paper "Demystifying Reinforcement Learning for Long-Horizon Tool-Using Agents: A Comprehensive Recipe", this model is built upon the Qwen2.5-3B-Instruct backbone.
Key Capabilities
- Tool Orchestration: Designed to handle complex, multi-turn agentic environments where the model must effectively call various tools.
- Constraint Satisfaction: Optimized to satisfy multifaceted constraints within these environments.
- Reinforcement Learning (RL) Tuned: Utilizes a unified post-training pipeline (Data Synthesis → SFT → RL) with staged rewards and enhanced exploration during the RL phase, which is particularly beneficial for smaller models like this 3B variant.
- Benchmark Performance: Optimized for benchmarks such as TravelPlanner.
Good For
- Developing agents that require sequential tool use over extended interactions.
- Research into reinforcement learning techniques for LLMs in agentic settings.
- Applications demanding efficient and constrained tool calling in complex scenarios.
For detailed inference instructions and the ReAct-based inference pipeline, refer to the official GitHub repository.