Agent-STAR-RL-1.5B: Long-Horizon Tool-Using Agent
Agent-STAR-RL-1.5B is a 1.5 billion parameter model, based on the Qwen2.5-1.5B-Instruct backbone, specifically trained for long-horizon tool orchestration and planning. Developed by xxwu, this model is the result of the STAR pipeline (Data Synthesis → SFT → RL), a systematic study into reinforcement learning (RL) for tool-using agents.
Key Capabilities
- Reinforcement Learning (RL) Optimization: Trained with RL to navigate complex, multi-turn environments.
- Tool Orchestration: Designed to effectively use and coordinate various tools for task completion.
- Long-Horizon Planning: Capable of planning and executing actions over extended interaction sequences.
- Scale-Aware Training: Benefits from specialized RL recipes, including staged rewards and enhanced exploration, tailored for smaller models to manage complex constraints.
Good For
- Developing Tool-Using Agents: Ideal for researchers and developers building agents that need to interact with external tools.
- Complex Multi-Turn Environments: Suited for applications requiring agents to maintain context and plan across many steps.
- Research in RL for LLMs: Provides a practical example and testbed for studying RL design spaces in language models, as detailed in the associated paper: Demystifying Reinforcement Learning for Long-Horizon Tool-Using Agents: A Comprehensive Recipe.
This model leverages a 32768 token context length and is demonstrated using the TravelPlanner testbed, with training data from the Agent-STAR-TravelDataset.