xxwu/Agent-STAR-RL-3B

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:3.1BQuant:BF16Ctx Length:32kPublished:Mar 23, 2026License:mitArchitecture:Transformer Open Weights Warm

Agent-STAR-RL-3B is a 3.1 billion parameter Large Language Model developed by Xixi Wu et al., fine-tuned for long-horizon tool orchestration tasks. Built on the Qwen2.5-3B-Instruct backbone, it utilizes a Data Synthesis → SFT → RL pipeline to enhance agentic capabilities. This model excels at complex, multi-turn environments requiring diverse tool calls to satisfy multifaceted constraints, particularly optimized for benchmarks like TravelPlanner.

Loading preview...

Agent-STAR-RL-3B: Long-Horizon Tool Orchestration

Agent-STAR-RL-3B is a 3.1 billion parameter Large Language Model (LLM) specifically fine-tuned for long-horizon tool orchestration tasks. Developed by Xixi Wu et al. and introduced in the paper "Demystifying Reinforcement Learning for Long-Horizon Tool-Using Agents: A Comprehensive Recipe", this model is built upon the Qwen2.5-3B-Instruct backbone.

Key Capabilities

  • Tool Orchestration: Designed to handle complex, multi-turn agentic environments where the model must effectively call various tools.
  • Constraint Satisfaction: Optimized to satisfy multifaceted constraints within these environments.
  • Reinforcement Learning (RL) Tuned: Utilizes a unified post-training pipeline (Data Synthesis → SFT → RL) with staged rewards and enhanced exploration during the RL phase, which is particularly beneficial for smaller models like this 3B variant.
  • Benchmark Performance: Optimized for benchmarks such as TravelPlanner.

Good For

  • Developing agents that require sequential tool use over extended interactions.
  • Research into reinforcement learning techniques for LLMs in agentic settings.
  • Applications demanding efficient and constrained tool calling in complex scenarios.

For detailed inference instructions and the ReAct-based inference pipeline, refer to the official GitHub repository.