Name: UKPLab/SciRM-Ref-7B API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: UKPLab

Overview

UKPLab/SciRM-Ref-7B is a 7.6 billion parameter reward model from UKPLab, built upon the Qwen2.5-7B-Instruct architecture. It is specifically engineered for the evaluation of scientific writing, utilizing a two-stage reinforcement learning framework (GRPO) to first optimize for scientific evaluation preferences and then refine reasoning capabilities through self-reflection.

Key Capabilities

Multi-aspect evaluation: Assesses multiple dimensions of scientific writing per task.
Dynamic scoring rubrics: Evaluation is conditioned on explicit criteria at both training and inference times.
Cross-task generalization: Capable of handling diverse and previously unseen scientific writing tasks without requiring task-specific retraining.
Reasoning enhancement: Improved reasoning capabilities compared to its predecessor, SciRM-7B, through a self-reflection stage.
Cost-efficient and open-source: Does not rely on proprietary large language models.

Use Cases

This model is ideal for automated evaluation of scientific texts, particularly in domains like NLP/CS. It can be applied to tasks such as related work section generation, scientific review writing, novelty evaluation alignment, and paper revision evaluation. Users should provide clear evaluation criteria and adhere to the system prompt for optimal performance. It is important to note that the model's primary training is on English scientific text from NLP/CS domains, and performance may vary in other scientific fields or languages. It should be used as a tool to assist evaluation, not as the sole arbiter for high-stakes decisions.

Overview

Overview

Key Capabilities

Use Cases

Full Model Card (README)