RioLee/ToolRM-Gen-Qwen3-4B-Thinking-2507
TEXT GENERATIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:32kPublished:Nov 10, 2025License:cc-by-nc-sa-4.0Architecture:Transformer0.0K Open Weights Warm

RioLee/ToolRM-Gen-Qwen3-4B-Thinking-2507 is a 4 billion parameter generative reward model from the Qwen3 family, developed by RioLee, specifically designed for agentic tool-use scenarios. It excels at pairwise reward judgments and broader critique tasks like Best-of-N sampling and self-correction, outperforming larger LLMs in these specialized evaluations. With a 40960 token context length, this model is optimized for evaluating and improving AI assistant performance in complex tool-use conversations.

Loading preview...