rubricreward/R3-Qwen3-4B-14k
TEXT GENERATIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:32kPublished:May 14, 2025License:apache-2.0Architecture:Transformer0.0K Open Weights Warm
rubricreward/R3-Qwen3-4B-14k is a 4 billion parameter reward model, fine-tuned from Qwen/Qwen3-4B, developed by rubricreward. It is part of the R3 family of Robust Rubric-Agnostic Reward Models, trained on a diverse dataset covering tasks like classification, preference optimization, and question answering. This model specializes in evaluating responses based on provided rubrics, scores, and reasoning, making it suitable for automated assessment and feedback generation.
Loading preview...
R3-Qwen3-4B-14k: A Rubric-Agnostic Reward Model
R3-Qwen3-4B-14k is a 4 billion parameter reward model developed by rubricreward, fine-tuned from the Qwen3-4B base model. It is a key component of the R3 (Robust Rubric-Agnostic Reward Models) family, designed to provide robust and detailed evaluations.
Key Capabilities
- Rubric-Agnostic Evaluation: Trained on a unique R3 dataset compiled from 45 diverse sources, enabling it to evaluate responses across various tasks without being tied to a single rubric format.
- Comprehensive Assessment: Each training example includes an instruction, task description, input, response(s), evaluation rubrics, a score, and corresponding reasoning, allowing the model to generate detailed assessments.
- Task Versatility: The training dataset covers a broad spectrum of tasks, including classification, preference optimization, and question answering, enhancing its adaptability to different evaluation scenarios.
- English Language Support: Primarily focused on English language processing for evaluation tasks.
Good For
- Automated Feedback Systems: Ideal for systems requiring automated evaluation of generated text based on specific criteria and rubrics.
- Preference Optimization: Can be used in scenarios where ranking or preferring one response over another is necessary, supported by detailed reasoning.
- Quality Assurance: Suitable for assessing the quality and adherence of responses to given instructions and evaluation guidelines.