Name: RioLee/ToolRM-Gen-Qwen3-4B-Thinking-2507 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: RioLee

ToolRM-Gen-Qwen3-4B-Thinking-2507: Agentic Tool-Use Reward Model

This model is part of the ToolRM family, a suite of lightweight generative and discriminative reward models specifically engineered for agentic tool-use. Developed by RioLee, this 4 billion parameter model is built on the Qwen3 architecture and is designed to evaluate and critique AI assistant performance in scenarios involving tool utilization.

Key Capabilities

Pairwise Critique: Conducts thorough comparisons between two generated assistant responses, making a clear choice of the superior option based on specific evidence and evaluation criteria.
Pointwise Critique: Provides concise critiques on how a single assistant response should be revised, or identifies it as correct.
Best-of-N Critique: Evaluates multiple assistant responses and selects the best one.
Tool-Use Evaluation: Specializes in assessing the appropriate and complete leveraging of available tools, validity of tool calls and arguments, and penalizing fabrication or repetitive actions.
Reinforcement Learning Support: Supports downstream RL training effectively, providing verifiable feedback.

Unique Approach

ToolRM models are trained on the novel ToolPref-Pairwise-30K dataset, constructed using a pipeline of rule-based scoring and multidimensional sampling. Evaluation is performed using TRBench-BFCL, a benchmark built on the agentic evaluation suite BFCL. This model, with its 40960 token context length, has demonstrated superior performance in pairwise reward judgments compared to several larger LLMs.

Usage Notes

The model was trained with a maximum input length of 16,384 tokens; longer prompts may lead to unpredictable behavior.
Swapping the order of assistant responses during evaluation is recommended to mitigate position bias.

Overview

ToolRM-Gen-Qwen3-4B-Thinking-2507: Agentic Tool-Use Reward Model

Key Capabilities

Unique Approach

Usage Notes

Full Model Card (README)