xxwu/Agent-STAR-RL-7B
TEXT GENERATIONConcurrency Cost:1Model Size:7.6BQuant:FP8Ctx Length:32kPublished:Mar 23, 2026License:mitArchitecture:Transformer0.0K Open Weights Cold

Agent-STAR-RL-7B by xxwu is a 7.6 billion parameter model based on Qwen2.5-7B-Instruct, fine-tuned using Reinforcement Learning (RL) for long-horizon tool-use tasks. It is specifically optimized for complex environments like TravelPlanner, requiring tool orchestration to satisfy multifaceted constraints. This model leverages GRPO with a dense SUM reward for enhanced performance and faster convergence in agentic applications.

Loading preview...