virtuoussy/Qwen2.5-7B-Instruct-RLVR
TEXT GENERATIONConcurrency Cost:1Model Size:7.6BQuant:FP8Ctx Length:32kPublished:Mar 31, 2025License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

The virtuoussy/Qwen2.5-7B-Instruct-RLVR model is a 7 billion parameter generative reward model based on Qwen/Qwen2.5-7B-Instruct. Developed by virtuoussy, it is specifically designed to evaluate the correctness of a given response against a reference answer, functioning as a verifiable reward mechanism. This model is optimized for diverse domains, as detailed in the paper "Expanding RL with Verifiable Rewards Across Diverse Domains," and supports multiple languages including Chinese and English.

Loading preview...