Name: UKPLab/SciRM-7B API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: UKPLab

SciRM-7B: Reward Model for Scientific Writing Evaluation

UKPLab/SciRM-7B is a 7.6 billion parameter reward model, built upon the Qwen2.5-7B-Instruct base, specifically engineered for the evaluation of scientific writing. Developed by UKPLab, this model employs a two-stage reinforcement learning framework (using GRPO) to first optimize scientific evaluation preferences and then refine reasoning capabilities. It is designed for cost-efficient, open-source scientific writing assessment without reliance on proprietary LLMs.

Key Capabilities

Multi-aspect evaluation: Assesses multiple dimensions per task, providing fine-grained feedback.
Dynamic scoring rubrics: Evaluation is conditioned on explicit evaluation constitutions, adaptable at both training and inference times.
Cross-task generalization: Handles diverse and previously unseen scientific writing tasks, such as Related Work Section Generation, Scientific Review Writing, Novelty Evaluation Alignment, and Paper Revision Evaluation, without task-specific retraining.
Enhanced reasoning: While not a complete reasoning model, it offers increased reasoning capabilities, especially when guided by clear system prompts and evaluation criteria.

Good For

Automated evaluation of scientific texts, particularly in NLP/CS domains.
Generating actionable feedback for scientific paper reviews.
Applications requiring dynamic and multi-faceted assessment of writing quality based on explicit criteria.

Limitations

Primarily trained on English scientific text from NLP/CS domains; performance may vary in other fields or languages.
Evaluation quality is highly dependent on the specificity and quality of the provided evaluation constitution.
Should not be the sole arbiter for high-stakes scientific publishing decisions.

Overview

SciRM-7B: Reward Model for Scientific Writing Evaluation

Key Capabilities

Good For

Limitations

Full Model Card (README)