zhuohaoyu/RewardAnything-8B-v1
TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Jun 1, 2025License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

RewardAnything-8B-v1, developed by Zhuohao Yu and collaborators from Peking University and WeChat AI, is an 8 billion parameter reward model designed for principle-following generalization. Unlike traditional reward models that learn implicit preferences from fixed datasets, RewardAnything interprets natural language principles at inference time, enabling dynamic adaptation to diverse evaluation criteria without retraining. This model excels at providing transparent reasoning for evaluation decisions and integrates seamlessly into existing RLHF pipelines.

Loading preview...