WestlakeNLP/WhizReviewer-ML-Llama3.1-8B

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Aug 13, 2024License:whizreviewer-llama-3.1-licenseArchitecture:Transformer Cold

WestlakeNLP/WhizReviewer-ML-Llama3.1-8B is an 8 billion parameter language model based on the Llama3.1 architecture, specifically fine-tuned for generating expert-level paper review comments in machine learning (CV, NLP, MM) fields. It simulates multiple reviewers and a meta-reviewer to provide comprehensive evaluations and scores for academic papers. This model is designed to promote iterative self-improvement in scientific research and assist in AI-driven research as a reward model.

Loading preview...

WhizReviewer-ML-Llama3.1-8B: AI-Powered Academic Review

This model is an 8 billion parameter language model, part of the WhizReviewer series developed by WestlakeNLP, derived from the Llama3.1 pre-trained model. It has undergone extensive supervised training on a dataset of paper-review comments from the machine learning domain (including Computer Vision, Natural Language Processing, and Multimedia).

Key Capabilities

  • Expert-level Review Generation: Generates detailed paper reviews, simulating multiple reviewers and a meta-reviewer, complete with summaries, soundness assessments, presentation evaluations, contribution analysis, strengths, weaknesses, questions, and a final rating.
  • Evaluation Score Prediction: Provides a near-human level evaluation score and predicts the final acceptance outcome (Accept/Reject) for papers, trained to ICLR or NeurIPS quality standards.
  • Iterative Research Improvement: Designed to help researchers rapidly iterate and refine their papers by providing quick, comprehensive feedback.
  • Auto-Research Support: Can function as a Reward Model to enhance the research capabilities of artificial intelligence systems.

Performance & Limitations

Evaluated on 784 ICLR 2024 papers, the 8B model achieved 59.41% accuracy in predicting Accept/Reject decisions. It requires significant GPU memory (e.g., 2 x A100/H100 for bf16) due to its long context requirements (14000 input tokens, 5000 output tokens). The model's knowledge cutoff is January 2024, and it is a pure text model, unable to directly process images or charts. It is explicitly not for official peer review and includes mechanisms like Fast-Detect-GPT to prevent misuse.