anonymousatom/IntelliAsk-Qwen3-32B-450-Merged
IntelliAsk-Qwen3-32B-450-Merged is a 32 billion parameter model fine-tuned from Qwen3-32B by anonymousatom using GRPO with IntelliReward. This model specializes in generating high-quality peer review questions for research papers, demonstrating superior depth and relevance compared to other LLMs. It achieves strong performance in automatic and human evaluations for question quality, and also shows improved general reasoning and writing capabilities.
Loading preview...
IntelliAsk-Qwen3-32B: Specialized Peer Review Question Generation
IntelliAsk-Qwen3-32B is a 32 billion parameter model, fine-tuned from Qwen3-32B using Group Relative Policy Optimization (GRPO) with a custom reward model called IntelliReward. Developed by Karun Sharma et al., its primary innovation lies in generating high-quality, in-depth peer review questions for research papers, moving beyond the shallow questions typically produced by standard SFT models.
Key Capabilities & Differentiators
- High-Quality Question Generation: Achieves a score of 0.55/3.0 in automatic evaluation via IntelliReward and 0.66/3.0 in human evaluation for question quality, outperforming Gemini 2.5 Pro (0.60).
- Reduced First-Page Bias: Demonstrates a lower first-page bias (21.37%) compared to other models, indicating it engages with the full paper content rather than just the introduction.
- Improved General Benchmarks: The RL training for question quality also transfers to general reasoning and writing tasks, showing improved scores on benchmarks like DROP, MuSR, GPQA-Diamond, WritingBench, and Arena Hard compared to the base Qwen3-32B.
- IntelliReward Model: Utilizes a specialized reward model trained on 572 expert-annotated question-paper pairs, focusing on 'Effort', 'Evidence', and 'Grounding' dimensions, achieving 72% mean accuracy in reward prediction.
Good For
- Generating peer review questions for ML papers, particularly in NLP and CV, targeting venues like ICLR, NeurIPS, CVPR, ACL, and EMNLP.
- Applications requiring deep, evidence-based questioning from textual content.
- Tasks benefiting from enhanced general reasoning and writing abilities.