PRIME-RL/Eurus-2-7B-SFT

TEXT GENERATIONConcurrency Cost:1Model Size:7.6BQuant:FP8Ctx Length:32kPublished:Dec 30, 2024License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

Eurus-2-7B-SFT is a 7.6 billion parameter language model developed by PRIME-RL, fine-tuned from Qwen2.5-Math-7B-Base. It specializes in mathematical and coding reasoning tasks, leveraging an action-centric chain-of-thought dataset for supervised fine-tuning. This model serves as a foundational stage for more advanced process reinforcement learning models, offering strong performance in structured problem-solving with a 131072 token context length.

Loading preview...

Eurus-2-7B-SFT Overview

Eurus-2-7B-SFT is a 7.6 billion parameter model from PRIME-RL, built upon the Qwen2.5-Math-7B-Base architecture, known for its strong mathematical capabilities. This model undergoes supervised fine-tuning (SFT) using the Eurus-2-SFT-Data, which is an action-centric chain-of-thought reasoning dataset. The SFT process is designed as a warm-up stage to teach the model effective reasoning patterns, preparing it for subsequent process reinforcement learning (PRIME) stages, such as with Eurus-2-7B-PRIME.

Key Capabilities

  • Enhanced Reasoning: Specialized in learning and applying reasoning patterns through an action-centric chain-of-thought approach.
  • Mathematical Problem Solving: Inherits and further develops strong mathematical capabilities from its base model, Qwen2.5-Math-7B-Base.
  • Code Generation: Demonstrates proficiency in generating Python code for problem-solving, with tailored prompting for optimal results.
  • Structured Output: Designed to present mathematical answers in LaTeX format and code in markdown blocks, facilitating clear and structured responses.
  • Long Context: Supports a substantial context length of 131072 tokens, enabling it to handle complex and lengthy problem descriptions.

Ideal Use Cases

  • Mathematical Assistance: Solving complex math problems and presenting solutions in a standardized LaTeX format.
  • Code Generation & Explanation: Generating Python code snippets and assisting with coding challenges.
  • Educational Tools: As a backend for applications requiring step-by-step reasoning in technical domains.
  • Research & Development: Serving as a robust starting point for further research into process reinforcement learning and advanced reasoning models.