UKPLab/SciRM-7B
TEXT GENERATIONConcurrency Cost:1Model Size:7.6BQuant:FP8Ctx Length:32kPublished:Jan 14, 2026License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

UKPLab/SciRM-7B is a 7.6 billion parameter reward model developed by UKPLab, based on Qwen2.5-7B-Instruct, designed for evaluating scientific writing. It utilizes a two-stage reinforcement learning framework to optimize for multi-aspect evaluation and dynamic scoring rubrics, enabling cross-task generalization across diverse scientific writing tasks. This model excels at providing fine-grained assessment of scientific texts, conditioned on explicit evaluation criteria, and supports a context length of 32768 tokens.

Loading preview...

SciRM-7B: Reward Model for Scientific Writing Evaluation

UKPLab/SciRM-7B is a 7.6 billion parameter reward model, built upon the Qwen2.5-7B-Instruct base, specifically engineered for the evaluation of scientific writing. Developed by UKPLab, this model employs a two-stage reinforcement learning framework (using GRPO) to first optimize scientific evaluation preferences and then refine reasoning capabilities. It is designed for cost-efficient, open-source scientific writing assessment without reliance on proprietary LLMs.

Key Capabilities

  • Multi-aspect evaluation: Assesses multiple dimensions per task, providing fine-grained feedback.
  • Dynamic scoring rubrics: Evaluation is conditioned on explicit evaluation constitutions, adaptable at both training and inference times.
  • Cross-task generalization: Handles diverse and previously unseen scientific writing tasks, such as Related Work Section Generation, Scientific Review Writing, Novelty Evaluation Alignment, and Paper Revision Evaluation, without task-specific retraining.
  • Enhanced reasoning: While not a complete reasoning model, it offers increased reasoning capabilities, especially when guided by clear system prompts and evaluation criteria.

Good For

  • Automated evaluation of scientific texts, particularly in NLP/CS domains.
  • Generating actionable feedback for scientific paper reviews.
  • Applications requiring dynamic and multi-faceted assessment of writing quality based on explicit criteria.

Limitations

  • Primarily trained on English scientific text from NLP/CS domains; performance may vary in other fields or languages.
  • Evaluation quality is highly dependent on the specificity and quality of the provided evaluation constitution.
  • Should not be the sole arbiter for high-stakes scientific publishing decisions.