Name: RLHFlow/Llama3.1-8B-ORM-Mistral-Data API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: RLHFlow

Overview

RLHFlow/Llama3.1-8B-ORM-Mistral-Data is an 8 billion parameter Outcome-supervised Reward Model (ORM) built upon Meta's Llama-3.1-8B-Instruct. It is fine-tuned for one epoch on the RLHFlow/Mistral-ORM-Data dataset, which consists of Mistral-generated mathematical problem-solving data. The model's primary function is to act as a robust evaluator for mathematical reasoning, capable of discerning correct solutions and improving the overall performance of generative models through re-ranking or selection.

Key Capabilities

Mathematical Reasoning Evaluation: Specialized in assessing the correctness and quality of solutions to mathematical problems.
Performance Enhancement: Demonstrates significant improvements in benchmarks like GSM8K and MATH when applied to re-rank or select outputs from generative models.
Out-of-Distribution Robustness: Shows strong performance even when evaluating outputs from models (e.g., Deepseek-7B) that were not part of its direct training data (OOD evaluation).

Performance Highlights

The model significantly boosts the performance of generative models on mathematical tasks. For a Mistral-7B generator, using Mistral-ORM@1024 improved GSM8K Pass@1 from 77.9 to 90.1 and MATH from 28.4 to 43.6. Similar gains were observed with a Deepseek-7B generator, where Mistral-ORM@1024 (OOD) improved GSM8K Pass@1 from 83.9 to 90.3 and MATH from 38.4 to 54.9.

Good For

Automated Evaluation: Scoring and ranking generated solutions for complex mathematical problems.
Improving LLM Math Performance: Integrating into pipelines to select higher-quality mathematical outputs from other LLMs.
Research in Reward Modeling: Exploring outcome-supervised reward modeling techniques for specialized tasks.

Overview

Overview

Key Capabilities

Performance Highlights

Good For

Full Model Card (README)