PRIME-RL/Eurus-2-7B-PRIME

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:7.6BQuant:FP8Ctx Length:32kPublished:Dec 31, 2024License:apache-2.0Architecture:Transformer0.1K Open Weights Warm

Eurus-2-7B-PRIME is a 7.6 billion parameter language model developed by PRIME-RL, trained using the Process Reinforcement through Implicit Reward (PRIME) method. This model, based on Qwen-2.5-Math-7B-Base, significantly enhances reasoning abilities, particularly in mathematical and coding tasks. It achieves substantial improvements on key reasoning benchmarks, including over 20% on AMC&AIME competitions, by leveraging online reinforcement learning with process rewards.

Loading preview...

Overview

Eurus-2-7B-PRIME is a 7.6 billion parameter model from PRIME-RL, built upon Eurus-2-7B-SFT and based on Qwen-2.5-Math-7B-Base. It is distinguished by its training using the PRIME (Process Reinforcement through Implicit Reward) method, an open-source online reinforcement learning (RL) solution. This method focuses on advancing language models' reasoning capabilities beyond traditional imitation or distillation techniques by incorporating implicit process rewards.

Key Capabilities & Training

The PRIME method involves a sophisticated algorithm that filters prompts, calculates implicit process rewards, updates an implicit Process Reward Model (PRM), and then updates the policy model using PPO loss. This approach has led to significant performance gains, particularly in complex reasoning tasks. Notably, Eurus-2-7B-PRIME achieved these results with considerably less data and model resources compared to Qwen-Math, utilizing 1/10th of the data.

Performance & Use Cases

Eurus-2-7B-PRIME demonstrates substantial improvements on key reasoning benchmarks, showing an average 16.7% improvement over its SFT version and over 20% improvement on AMC & AIME competitions. It surpasses the instruct version of its base model, Qwen-2.5-Math-7B-Instruct, on 5 key reasoning benchmarks. The model is specifically tailored for coding and mathematical problem-solving, with optimized prompt formats for generating Python code and LaTeX-formatted mathematical answers. This makes it highly suitable for applications requiring robust logical and mathematical reasoning.