Philip-MIT/SOLE-R1-8B
SOLE-R1-8B by Philip-MIT is a video-language reward reasoning model designed for robotics, estimating task progress from robot video frames and natural-language task descriptions. It generates per-timestep reasoning traces and scalar progress predictions, which serve as dense reward signals for online robot reinforcement learning. This model excels at interpreting visual observations to provide continuous feedback on robotic task completion, particularly useful when manual reward engineering is impractical. It processes visual observations from multiple camera views and task descriptions to output a progress percentage.
Loading preview...
SOLE-R1-8B: Video-Language Reward Reasoning for Robotics
SOLE-R1-8B, developed by Philip-MIT, is a specialized video-language model engineered to provide reward signals for robotic reinforcement learning. Its core function is to estimate task progress by analyzing robot video frames in conjunction with a natural-language task description. The model outputs both a detailed reasoning trace and a scalar progress percentage, which can be directly utilized as a dense reward for online robot reinforcement learning.
Key Capabilities
- Video-Language Understanding: Interprets robot actions and task descriptions from visual inputs (video frames) and text.
- Progress Estimation: Generates a numerical percentage representing the current completion status of a robotic task.
- Reasoning Traces: Provides human-readable explanations for its progress estimations, formatted as
<think>reasoning</think><answer>progress%</answer>. - Multi-view Processing: Capable of integrating visual observations from multiple camera views (e.g., external and wrist cameras) for comprehensive understanding.
- Reinforcement Learning Rewards: Designed to serve as a crucial component for on-robot reinforcement learning, especially in scenarios where manual reward engineering is challenging or impossible.
Training and Data
The model was trained on the extensive SOLE-R1-8B training dataset, which includes robot task progress examples with images, prompts, reasoning completions, and progress labels. This dataset also incorporates diverse spatial and multi-frame temporal reasoning data from various sources like SSR-CoT, SpatialVLM, and Embodied CoT, forming a robust foundational training mixture.
Good For
- Robotics Research: Researchers developing and experimenting with reinforcement learning for robotic control.
- Automated Reward Generation: Projects requiring automated, dense reward signals for complex robotic tasks without manual engineering.
- Task Progress Monitoring: Applications needing real-time, interpretable progress tracking for robot operations.
- Video-based Task Analysis: Analyzing and understanding robot task execution from video footage.