WestlakeNLP/WhizReviewer-ML-Llama3.1-8B
WestlakeNLP/WhizReviewer-ML-Llama3.1-8B is an 8 billion parameter language model based on the Llama3.1 architecture, specifically fine-tuned for generating expert-level paper review comments in machine learning (CV, NLP, MM) fields. It simulates multiple reviewers and a meta-reviewer to provide comprehensive evaluations and scores for academic papers. This model is designed to promote iterative self-improvement in scientific research and assist in AI-driven research as a reward model.
Loading preview...
WhizReviewer-ML-Llama3.1-8B: AI-Powered Academic Review
This model is an 8 billion parameter language model, part of the WhizReviewer series developed by WestlakeNLP, derived from the Llama3.1 pre-trained model. It has undergone extensive supervised training on a dataset of paper-review comments from the machine learning domain (including Computer Vision, Natural Language Processing, and Multimedia).
Key Capabilities
- Expert-level Review Generation: Generates detailed paper reviews, simulating multiple reviewers and a meta-reviewer, complete with summaries, soundness assessments, presentation evaluations, contribution analysis, strengths, weaknesses, questions, and a final rating.
- Evaluation Score Prediction: Provides a near-human level evaluation score and predicts the final acceptance outcome (Accept/Reject) for papers, trained to ICLR or NeurIPS quality standards.
- Iterative Research Improvement: Designed to help researchers rapidly iterate and refine their papers by providing quick, comprehensive feedback.
- Auto-Research Support: Can function as a Reward Model to enhance the research capabilities of artificial intelligence systems.
Performance & Limitations
Evaluated on 784 ICLR 2024 papers, the 8B model achieved 59.41% accuracy in predicting Accept/Reject decisions. It requires significant GPU memory (e.g., 2 x A100/H100 for bf16) due to its long context requirements (14000 input tokens, 5000 output tokens). The model's knowledge cutoff is January 2024, and it is a pure text model, unable to directly process images or charts. It is explicitly not for official peer review and includes mechanisms like Fast-Detect-GPT to prevent misuse.