OpenMOSS-Team/SciJudge-4B-2605

TEXT GENERATIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:32kTool Calling:SupportedPublished:Jul 2, 2026License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

SciJudge-4B-2605 is a 4 billion parameter Qwen3-4B-Instruct-2507 based model developed by OpenMOSS-Team, fine-tuned for scientific paper evaluation. It predicts which of two papers has higher citation impact based on titles, abstracts, and publication dates, leveraging a 32768 token context length. This model excels at discerning scientific 'taste' and is part of research on AI learning scientific judgment.

Loading preview...

SciJudge-4B-2605: AI for Scientific Paper Evaluation

SciJudge-4B-2605 is a 4 billion parameter model developed by OpenMOSS-Team, built upon the Qwen3-4B-Instruct-2507 architecture. Its core function is to evaluate scientific papers, specifically predicting which of two given papers (with titles, abstracts, and publication dates) will achieve a higher citation count.

Key Capabilities & Features

  • Citation Impact Prediction: Specialized in comparing two scientific papers and determining which is likely to have greater future citation impact.
  • Contextual Analysis: Utilizes paper titles, abstracts, and publication dates for its comparative judgment.
  • Base Model: Fine-tuned from Qwen3-4B-Instruct-2507, indicating a strong foundation in instruction following and general language understanding.
  • Training Methodology: Trained using GRPO with DAPO loss and an external preference reward, leveraging 720,341 preference pairs from the SciJudgeBench dataset.
  • Performance: Achieves an average accuracy of 77.3% on the SciJudgeBench test split (MAIN_1000 in-domain evaluation set), significantly outperforming its base model (58.1%).

Good For

  • Scientific Research Analysis: Ideal for tasks requiring an AI to assess the potential impact or 'taste' of scientific publications.
  • Academic Trend Prediction: Useful for researchers or institutions interested in forecasting the influence of new scientific work.
  • Benchmarking: Serves as a smaller, efficient model for evaluating scientific judgment tasks, complementing its larger counterpart, SciJudge-30B-2605.