Name: tanhuajie2001/Robo-Dopamine-GRM-2.0-8B-Preview API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: tanhuajie2001

Robo-Dopamine-GRM-2.0-8B-Preview: General Process Reward Modeling for Robotics

This model, developed by tanhuajie2001, is an 8 billion parameter vision-language model (VLM) specifically engineered for high-precision robotic manipulation tasks. It introduces a novel approach to reward modeling, aiming to provide stable and accurate signals for reinforcement learning in robotics.

Key Capabilities

General Reward Model (GRM): A core vision-language model that interprets task descriptions and multi-view images (initial, goal, "BEFORE," and "AFTER" states) to predict relative progress or regress.
Multi-Perspective Progress Fusion: Combines incremental, forward-anchored, and backward-anchored predictions to generate a robust and accurate fused reward signal.
Dopamine-RL Training Framework: Facilitates one-shot GRM adaptation to new tasks using a single demonstration.
Policy-Invariant Reward Shaping: Converts the GRM's dense output into an effective reward signal that accelerates learning without altering the optimal policy, making it compatible with various RL algorithms.

Good For

Developers and researchers working on robotic manipulation requiring precise and stable reward signals.
Applications where one-shot learning from demonstrations is crucial for task adaptation.
Integrating advanced reward modeling into existing reinforcement learning frameworks for robotics.

Overview

Robo-Dopamine-GRM-2.0-8B-Preview: General Process Reward Modeling for Robotics

Key Capabilities

Good For

Full Model Card (README)