teetone/RoboReward-8B
teetone/RoboReward-8B is an 8 billion parameter vision-language reward model for robotics, built upon the Qwen-3 VL architecture. It is specifically trained on the RoboReward dataset to predict discrete end-of-episode progress rewards from real-robot rollout videos. This model excels at evaluating robotic task completion by assigning scores from 1 (no success) to 5 (perfect completion) based on video input and task instructions, making it ideal for robotic learning and evaluation.
Loading preview...
RoboReward-8B: Vision-Language Reward Model for Robotics
RoboReward-8B is an 8 billion parameter vision-language model developed by teetone, designed to provide general-purpose reward signals for robotic tasks. Built on the Qwen-3 VL architecture, it is trained using the RoboReward dataset to analyze real-robot rollout videos.
Key Capabilities
- Discrete Progress Prediction: Given a task instruction and a robot rollout video, the model predicts a discrete end-of-episode progress score from 1 to 5.
- Robotic Task Evaluation: It assesses the final state of a robotic action against a given task, providing a quantitative measure of success or failure.
- Vision-Language Integration: Combines visual information from videos with textual task instructions to understand and evaluate robotic performance.
Reward Rubric
The model uses a specific rubric to assign scores:
- 1 - No Success: No goal-relevant change.
- 2 - Minimal Progress: Small, insufficient change.
- 3 - Partial Completion: Good progress, but major violations or multiple minor ones.
- 4 - Near Completion: Correct intent, but a single minor requirement missed.
- 5 - Perfect Completion: All requirements satisfied.
Use Cases
This model is particularly suited for:
- Robotic Reinforcement Learning: Providing reward signals for training robotic agents.
- Automated Robotic Evaluation: Objectively scoring robotic task performance without human intervention.
- Robotics Research: Aiding in the development and analysis of robotic control policies.