SpiceRL/DRA-GRPO

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:May 24, 2025License:cc-by-4.0Architecture:Transformer0.0K Open Weights Warm

SpiceRL/DRA-GRPO is a 1.5 billion parameter language model developed by SpiceRL, featuring a substantial 131072 token context length. This model is distinguished by its application of Diversity-Aware Reward Adjustment (DRA) within a GRPO framework, a novel approach for R1-Zero-like training of large language models. It is primarily designed for research and development in advanced reinforcement learning from human feedback (RLHF) techniques.

Loading preview...

DRA-GRPO Model Overview

SpiceRL/DRA-GRPO is a 1.5 billion parameter language model that implements a novel training methodology detailed in the paper "DRA-GRPO: Exploring Diversity-Aware Reward Adjustment for R1-Zero-Like Training of Large Language Models." This model leverages a significant 131072 token context window, enabling it to process and generate extensive sequences of text.

Key Capabilities

  • Diversity-Aware Reward Adjustment (DRA): Integrates a unique reward adjustment mechanism to enhance the diversity of generated responses during reinforcement learning from human feedback (RLHF).
  • GRPO Framework: Utilizes the GRPO (Generalized Policy Optimization) framework, a robust policy optimization algorithm, for stable and effective model training.
  • R1-Zero-Like Training: Employs a training paradigm inspired by R1-Zero, focusing on improving the alignment and performance of large language models through advanced RL techniques.
  • Extended Context Length: Benefits from a 131072 token context, allowing for deep contextual understanding and generation over long inputs.

Good For

  • RLHF Research: Ideal for researchers and developers exploring advanced techniques in reinforcement learning from human feedback.
  • Experimental LLM Training: Suitable for experimenting with novel reward modeling and policy optimization strategies in language model development.
  • Understanding Diversity in LLMs: Provides a platform to study the impact of diversity-aware training on model outputs and alignment.

For more in-depth technical details, refer to the original research paper and the full codebase.