IntelliAsk-Qwen3-32B: Specialized Peer Review Question Generation
IntelliAsk-Qwen3-32B is a 32 billion parameter model, fine-tuned from Qwen3-32B using Group Relative Policy Optimization (GRPO) with a custom reward model called IntelliReward. Developed by Karun Sharma et al., its primary innovation lies in generating high-quality, in-depth peer review questions for research papers, moving beyond the shallow questions typically produced by standard SFT models.
Key Capabilities & Differentiators
- High-Quality Question Generation: Achieves a score of 0.55/3.0 in automatic evaluation via IntelliReward and 0.66/3.0 in human evaluation for question quality, outperforming Gemini 2.5 Pro (0.60).
- Reduced First-Page Bias: Demonstrates a lower first-page bias (21.37%) compared to other models, indicating it engages with the full paper content rather than just the introduction.
- Improved General Benchmarks: The RL training for question quality also transfers to general reasoning and writing tasks, showing improved scores on benchmarks like DROP, MuSR, GPQA-Diamond, WritingBench, and Arena Hard compared to the base Qwen3-32B.
- IntelliReward Model: Utilizes a specialized reward model trained on 572 expert-annotated question-paper pairs, focusing on 'Effort', 'Evidence', and 'Grounding' dimensions, achieving 72% mean accuracy in reward prediction.
Good For
- Generating peer review questions for ML papers, particularly in NLP and CV, targeting venues like ICLR, NeurIPS, CVPR, ACL, and EMNLP.
- Applications requiring deep, evidence-based questioning from textual content.
- Tasks benefiting from enhanced general reasoning and writing abilities.