IntelliAsk-Qwen3-32B: Specialized Peer Review Question Generation

IntelliAsk-Qwen3-32B is a 32 billion parameter model, fine-tuned from Qwen3-32B using Group Relative Policy Optimization (GRPO) with a custom reward model called IntelliReward. Developed by Karun Sharma et al., its primary innovation lies in generating high-quality, in-depth peer review questions for research papers, moving beyond the shallow questions typically produced by standard SFT models.

Key Capabilities & Differentiators

High-Quality Question Generation: Achieves a score of 0.55/3.0 in automatic evaluation via IntelliReward and 0.66/3.0 in human evaluation for question quality, outperforming Gemini 2.5 Pro (0.60).
Reduced First-Page Bias: Demonstrates a lower first-page bias (21.37%) compared to other models, indicating it engages with the full paper content rather than just the introduction.
Improved General Benchmarks: The RL training for question quality also transfers to general reasoning and writing tasks, showing improved scores on benchmarks like DROP, MuSR, GPQA-Diamond, WritingBench, and Arena Hard compared to the base Qwen3-32B.
IntelliReward Model: Utilizes a specialized reward model trained on 572 expert-annotated question-paper pairs, focusing on 'Effort', 'Evidence', and 'Grounding' dimensions, achieving 72% mean accuracy in reward prediction.

Good For

Generating peer review questions for ML papers, particularly in NLP and CV, targeting venues like ICLR, NeurIPS, CVPR, ACL, and EMNLP.
Applications requiring deep, evidence-based questioning from textual content.
Tasks benefiting from enhanced general reasoning and writing abilities.

Overview

IntelliAsk-Qwen3-32B: Specialized Peer Review Question Generation

Key Capabilities & Differentiators

Good For

Full Model Card (README)