SpiceRL/DRA-GRPO
TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:May 24, 2025License:cc-by-4.0Architecture:Transformer0.0K Open Weights Warm

SpiceRL/DRA-GRPO is a 1.5 billion parameter language model developed by SpiceRL, featuring a substantial 131072 token context length. This model is distinguished by its application of Diversity-Aware Reward Adjustment (DRA) within a GRPO framework, a novel approach for R1-Zero-like training of large language models. It is primarily designed for research and development in advanced reinforcement learning from human feedback (RLHF) techniques.

Loading preview...