tanhuajie2001/Robo-Dopamine-GRM-2.0-8B-Preview

VISIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Mar 4, 2026License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

Robo-Dopamine-GRM-2.0-8B-Preview by tanhuajie2001 is an 8 billion parameter vision-language model designed for high-precision robotic manipulation. It features a General Reward Model (GRM) that predicts relative progress or regress using multi-view images and a Dopamine-RL framework for one-shot adaptation and policy-invariant reward shaping. This model is specifically optimized to provide stable and accurate reward signals for accelerating robotic learning.

Loading preview...

Robo-Dopamine-GRM-2.0-8B-Preview: General Process Reward Modeling for Robotics

This model, developed by tanhuajie2001, is an 8 billion parameter vision-language model (VLM) specifically engineered for high-precision robotic manipulation tasks. It introduces a novel approach to reward modeling, aiming to provide stable and accurate signals for reinforcement learning in robotics.

Key Capabilities

  • General Reward Model (GRM): A core vision-language model that interprets task descriptions and multi-view images (initial, goal, "BEFORE," and "AFTER" states) to predict relative progress or regress.
  • Multi-Perspective Progress Fusion: Combines incremental, forward-anchored, and backward-anchored predictions to generate a robust and accurate fused reward signal.
  • Dopamine-RL Training Framework: Facilitates one-shot GRM adaptation to new tasks using a single demonstration.
  • Policy-Invariant Reward Shaping: Converts the GRM's dense output into an effective reward signal that accelerates learning without altering the optimal policy, making it compatible with various RL algorithms.

Good For

  • Developers and researchers working on robotic manipulation requiring precise and stable reward signals.
  • Applications where one-shot learning from demonstrations is crucial for task adaptation.
  • Integrating advanced reward modeling into existing reinforcement learning frameworks for robotics.