teetone/RoboReward-4B
RoboReward-4B by teetone is a 4 billion parameter general-purpose vision-language reward model for robotics, built upon Qwen-3 VL. It is specifically trained on the RoboReward dataset to predict discrete end-of-episode progress scores (1-5) from real-robot rollout videos. This model excels at evaluating robotic task completion based on visual input and task instructions, providing a quantifiable reward signal for robotic learning and evaluation.
Loading preview...
RoboReward-4B: Vision-Language Reward Model for Robotics
RoboReward-4B is a specialized 4 billion parameter vision-language model developed by teetone, designed to provide discrete end-of-episode progress rewards for robotic tasks. Built on the Qwen-3 VL architecture, this model processes both task instructions and real-robot rollout videos to assess performance.
Key Capabilities
- Vision-Language Understanding: Integrates visual information from robot videos with textual task instructions.
- Discrete Reward Prediction: Outputs a progress score from 1 (No success) to 5 (Perfect completion) for robotic actions.
- Robotics-Specific Training: Trained on the dedicated RoboReward dataset, focusing on real-world robotic scenarios.
- End-of-Episode Evaluation: Designed to judge the final state of a robotic task, providing a clear metric for success or failure.
Good For
- Robotics Research: Evaluating and quantifying the success of robotic manipulation and navigation tasks.
- Reinforcement Learning in Robotics: Generating reward signals for training robotic agents without manual annotation.
- Automated Task Assessment: Providing objective, vision-based feedback on robot performance.
This model offers a robust solution for integrating visual and linguistic understanding to automate the assessment of robotic task completion, as detailed in its associated paper.