UKPLab/SciRM-Ref-7B
TEXT GENERATIONConcurrency Cost:1Model Size:7.6BQuant:FP8Ctx Length:32kPublished:Jan 14, 2026License:apache-2.0Architecture:Transformer Open Weights Cold

UKPLab/SciRM-Ref-7B is a 7.6 billion parameter reward model developed by UKPLab, based on Qwen2.5-7B-Instruct, specifically designed for evaluating scientific writing. It employs a two-stage reinforcement learning framework to optimize scientific evaluation preferences and enhance reasoning capabilities through self-reflection. This model excels at multi-aspect evaluation and cross-task generalization for diverse scientific writing tasks, offering dynamic scoring rubrics without reliance on proprietary LLMs.

Loading preview...

Overview

UKPLab/SciRM-Ref-7B is a 7.6 billion parameter reward model from UKPLab, built upon the Qwen2.5-7B-Instruct architecture. It is specifically engineered for the evaluation of scientific writing, utilizing a two-stage reinforcement learning framework (GRPO) to first optimize for scientific evaluation preferences and then refine reasoning capabilities through self-reflection.

Key Capabilities

  • Multi-aspect evaluation: Assesses multiple dimensions of scientific writing per task.
  • Dynamic scoring rubrics: Evaluation is conditioned on explicit criteria at both training and inference times.
  • Cross-task generalization: Capable of handling diverse and previously unseen scientific writing tasks without requiring task-specific retraining.
  • Reasoning enhancement: Improved reasoning capabilities compared to its predecessor, SciRM-7B, through a self-reflection stage.
  • Cost-efficient and open-source: Does not rely on proprietary large language models.

Use Cases

This model is ideal for automated evaluation of scientific texts, particularly in domains like NLP/CS. It can be applied to tasks such as related work section generation, scientific review writing, novelty evaluation alignment, and paper revision evaluation. Users should provide clear evaluation criteria and adhere to the system prompt for optimal performance. It is important to note that the model's primary training is on English scientific text from NLP/CS domains, and performance may vary in other scientific fields or languages. It should be used as a tool to assist evaluation, not as the sole arbiter for high-stakes decisions.