Name: xxwu/Agent-STAR-RL-1.5B API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: xxwu

Agent-STAR-RL-1.5B: Long-Horizon Tool-Using Agent

Agent-STAR-RL-1.5B is a 1.5 billion parameter model, based on the Qwen2.5-1.5B-Instruct backbone, specifically trained for long-horizon tool orchestration and planning. Developed by xxwu, this model is the result of the STAR pipeline (Data Synthesis → SFT → RL), a systematic study into reinforcement learning (RL) for tool-using agents.

Key Capabilities

Reinforcement Learning (RL) Optimization: Trained with RL to navigate complex, multi-turn environments.
Tool Orchestration: Designed to effectively use and coordinate various tools for task completion.
Long-Horizon Planning: Capable of planning and executing actions over extended interaction sequences.
Scale-Aware Training: Benefits from specialized RL recipes, including staged rewards and enhanced exploration, tailored for smaller models to manage complex constraints.

Good For

Developing Tool-Using Agents: Ideal for researchers and developers building agents that need to interact with external tools.
Complex Multi-Turn Environments: Suited for applications requiring agents to maintain context and plan across many steps.
Research in RL for LLMs: Provides a practical example and testbed for studying RL design spaces in language models, as detailed in the associated paper: Demystifying Reinforcement Learning for Long-Horizon Tool-Using Agents: A Comprehensive Recipe.

This model leverages a 32768 token context length and is demonstrated using the TravelPlanner testbed, with training data from the Agent-STAR-TravelDataset.