Jaew00Lee/Qwen3-4B-PRInTS

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:32kPublished:Dec 10, 2025License:mitArchitecture:Transformer Open Weights Warm

Jaew00Lee/Qwen3-4B-PRInTS is a 4 billion parameter Qwen3-based generative process reward model developed by Jaewoo Lee, Archiki Prasad, Justin Chih-Yao Chen, Zaid Khan, Elias Stengel-Eskin, and Mohit Bansal. It is specifically fine-tuned for long-horizon information-seeking tasks, excelling at evaluating agent trajectory steps and recursively summarizing context. The model's primary strength lies in providing fine-grained guidance for information-seeking agents by scoring candidate next steps and maintaining a compact information-seeking trajectory summary within its 40960 token context window.

Loading preview...

Overview of PRInTS Qwen3-4B

PRInTS (Process Reward via Information gain scoring and Trajectory Summarization) Qwen3-4B is a 4 billion parameter generative process reward model developed by Jaewoo Lee et al. It is fine-tuned from the Qwen3-4B base model with a substantial 40960 token context length, designed to address the challenges of context accumulation in long-horizon information-seeking tasks.

Key Capabilities

  • Generative Process Reward Model (PRM): Jointly trained with two core abilities for fine-grained guidance.
  • Scoring Mechanism: Evaluates multiple candidate next trajectory steps for an agent, providing dense scores based on reasoning across various step quality dimensions (e.g., interpretation of tool outputs, tool call informativeness).
  • Trajectory Summarization: Recursively updates a compact information-seeking trajectory summary. This feature helps keep input length bounded while preserving critical information for subsequent score evaluations.

Use Cases

  • Agent Guidance: Provides fine-grained, step-level guidance for information-seeking agents at test time.
  • Information-Seeking Tasks: Optimized for scenarios requiring long-horizon information retrieval and processing.
  • Trajectory Evaluation: Estimates step-level information-gain scores across multiple agent rollouts, enhancing decision-making for complex tasks.

This model is licensed under MIT and its development is detailed in the paper PRInTS: Reward Modeling for Long-Horizon Information Seeking.