Name: PRIME-RL/Eurus-2-7B-PRIME API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: PRIME-RL

Overview

Eurus-2-7B-PRIME is a 7.6 billion parameter model from PRIME-RL, built upon Eurus-2-7B-SFT and based on Qwen-2.5-Math-7B-Base. It is distinguished by its training using the PRIME (Process Reinforcement through Implicit Reward) method, an open-source online reinforcement learning (RL) solution. This method focuses on advancing language models' reasoning capabilities beyond traditional imitation or distillation techniques by incorporating implicit process rewards.

Key Capabilities & Training

The PRIME method involves a sophisticated algorithm that filters prompts, calculates implicit process rewards, updates an implicit Process Reward Model (PRM), and then updates the policy model using PPO loss. This approach has led to significant performance gains, particularly in complex reasoning tasks. Notably, Eurus-2-7B-PRIME achieved these results with considerably less data and model resources compared to Qwen-Math, utilizing 1/10th of the data.

Performance & Use Cases

Eurus-2-7B-PRIME demonstrates substantial improvements on key reasoning benchmarks, showing an average 16.7% improvement over its SFT version and over 20% improvement on AMC & AIME competitions. It surpasses the instruct version of its base model, Qwen-2.5-Math-7B-Instruct, on 5 key reasoning benchmarks. The model is specifically tailored for coding and mathematical problem-solving, with optimized prompt formats for generating Python code and LaTeX-formatted mathematical answers. This makes it highly suitable for applications requiring robust logical and mathematical reasoning.

Overview

Overview

Key Capabilities & Training

Performance & Use Cases

Full Model Card (README)