xxwu/Agent-STAR-RL-7B

TEXT GENERATIONConcurrency Cost:1Model Size:7.6BQuant:FP8Ctx Length:32kPublished:Mar 23, 2026License:mitArchitecture:Transformer0.0K Open Weights Cold

Agent-STAR-RL-7B by xxwu is a 7.6 billion parameter model based on Qwen2.5-7B-Instruct, fine-tuned using Reinforcement Learning (RL) for long-horizon tool-use tasks. It is specifically optimized for complex environments like TravelPlanner, requiring tool orchestration to satisfy multifaceted constraints. This model leverages GRPO with a dense SUM reward for enhanced performance and faster convergence in agentic applications.

Loading preview...

Agent-STAR-RL-7B Overview

Agent-STAR-RL-7B is a 7.6 billion parameter language model derived from Qwen2.5-7B-Instruct, specifically fine-tuned using Reinforcement Learning (RL) for advanced tool-use capabilities. Developed by xxwu, this model is a key artifact from the research paper "Demystifying Reinforcement Learning for Long-Horizon Tool-Using Agents: A Comprehensive Recipe" (arXiv:2603.21972).

Key Capabilities & Features

  • RL-Optimized Tool Use: Fine-tuned with a novel STAR (Data Synthesis → SFT → RL) pipeline, focusing on scaling RL in complex, multi-turn environments.
  • Long-Horizon Task Performance: Optimized for challenging tasks requiring extensive tool orchestration, such as the TravelPlanner testbed, which involves satisfying commonsense and hard constraints.
  • Efficient RL Training: Utilizes GRPO (Group Relative Policy Optimization) with a dense SUM reward for improved performance and faster convergence, as detailed in the associated research.
  • Qwen2.5-7B-Instruct Backbone: Built upon a robust base model, enhancing its foundational language understanding and generation capabilities.

Recommended Use Cases

  • Agentic Frameworks: Designed for integration into ReAct-style agentic systems that require sophisticated tool interaction.
  • Complex Planning & Orchestration: Ideal for applications involving multi-step planning and the coordinated use of various tools to achieve long-horizon goals.
  • Research in RL for LLMs: A valuable resource for researchers exploring reinforcement learning techniques for enhancing large language models in agent-based scenarios.