Name: xxwu/Agent-STAR-RL-7B API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: xxwu

Agent-STAR-RL-7B Overview

Agent-STAR-RL-7B is a 7.6 billion parameter language model derived from Qwen2.5-7B-Instruct, specifically fine-tuned using Reinforcement Learning (RL) for advanced tool-use capabilities. Developed by xxwu, this model is a key artifact from the research paper "Demystifying Reinforcement Learning for Long-Horizon Tool-Using Agents: A Comprehensive Recipe" (arXiv:2603.21972).

Key Capabilities & Features

RL-Optimized Tool Use: Fine-tuned with a novel STAR (Data Synthesis → SFT → RL) pipeline, focusing on scaling RL in complex, multi-turn environments.
Long-Horizon Task Performance: Optimized for challenging tasks requiring extensive tool orchestration, such as the TravelPlanner testbed, which involves satisfying commonsense and hard constraints.
Efficient RL Training: Utilizes GRPO (Group Relative Policy Optimization) with a dense SUM reward for improved performance and faster convergence, as detailed in the associated research.
Qwen2.5-7B-Instruct Backbone: Built upon a robust base model, enhancing its foundational language understanding and generation capabilities.

Recommended Use Cases

Agentic Frameworks: Designed for integration into ReAct-style agentic systems that require sophisticated tool interaction.
Complex Planning & Orchestration: Ideal for applications involving multi-step planning and the coordinated use of various tools to achieve long-horizon goals.
Research in RL for LLMs: A valuable resource for researchers exploring reinforcement learning techniques for enhancing large language models in agent-based scenarios.

Overview

Agent-STAR-RL-7B Overview

Key Capabilities & Features

Recommended Use Cases

Full Model Card (README)