tanhuajie2001/Robo-Dopamine-GRM-2.0-8B-Preview
Robo-Dopamine-GRM-2.0-8B-Preview by tanhuajie2001 is an 8 billion parameter vision-language model designed for high-precision robotic manipulation. It features a General Reward Model (GRM) that predicts relative progress or regress using multi-view images and a Dopamine-RL framework for one-shot adaptation and policy-invariant reward shaping. This model is specifically optimized to provide stable and accurate reward signals for accelerating robotic learning.
Loading preview...
Robo-Dopamine-GRM-2.0-8B-Preview: General Process Reward Modeling for Robotics
This model, developed by tanhuajie2001, is an 8 billion parameter vision-language model (VLM) specifically engineered for high-precision robotic manipulation tasks. It introduces a novel approach to reward modeling, aiming to provide stable and accurate signals for reinforcement learning in robotics.
Key Capabilities
- General Reward Model (GRM): A core vision-language model that interprets task descriptions and multi-view images (initial, goal, "BEFORE," and "AFTER" states) to predict relative progress or regress.
- Multi-Perspective Progress Fusion: Combines incremental, forward-anchored, and backward-anchored predictions to generate a robust and accurate fused reward signal.
- Dopamine-RL Training Framework: Facilitates one-shot GRM adaptation to new tasks using a single demonstration.
- Policy-Invariant Reward Shaping: Converts the GRM's dense output into an effective reward signal that accelerates learning without altering the optimal policy, making it compatible with various RL algorithms.
Good For
- Developers and researchers working on robotic manipulation requiring precise and stable reward signals.
- Applications where one-shot learning from demonstrations is crucial for task adaptation.
- Integrating advanced reward modeling into existing reinforcement learning frameworks for robotics.