hkust-nlp/dart-math-mistral-7b-prop2diff

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Jun 5, 2024License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

The hkust-nlp/dart-math-mistral-7b-prop2diff model is a 7 billion parameter Mistral-based language model developed by hkust-nlp, fine-tuned using Difficulty-Aware Rejection Tuning (DART-Math). This model is specifically optimized for mathematical problem-solving, demonstrating strong performance on various in-domain and challenging out-of-domain math benchmarks. It utilizes a 'Prop2Diff' strategy to address biases towards easier queries in training datasets, making it particularly effective for complex mathematical reasoning tasks.

Loading preview...

DART-Math-Mistral-7B-Prop2Diff Overview

This model is a 7 billion parameter variant of the Mistral architecture, developed by hkust-nlp, and fine-tuned using the DART-Math (Difficulty-Aware Rejection Tuning) methodology. DART-Math aims to overcome the limitations of traditional rejection sampling in mathematical datasets, which often exhibit a bias towards easier problems. By employing a 'Prop2Diff' (proportional to difficulty) sampling strategy, this model is trained on datasets where more challenging queries are better represented, leading to improved performance on complex mathematical tasks.

Key Capabilities and Performance

  • Enhanced Mathematical Reasoning: Achieves strong results on a range of mathematical benchmarks, including MATH, GSM8K, CollegeMath, DeepMind Mathematics, OlympiadBench-Math, and TheoremQA.
  • Outperforms Baselines: Demonstrates superior or competitive performance compared to other models in its size class, such as Mistral-7B-MetaMath, on challenging out-of-domain math problems.
  • Difficulty-Aware Training: Utilizes a novel training dataset construction method, DARS, which addresses the issue of severe biases towards easy queries in standard mathematical datasets.

Training Details

  • Base Model: Mistral-7B.
  • Training Data: Synthetic datasets derived from MATH and GSM8K training sets, processed with Difficulty-Aware Rejection Sampling.
  • Prompt Template: Uses the Alpaca prompt template.
  • Max Sequence Length: 4096 tokens.

When to Use This Model

This model is particularly well-suited for applications requiring robust mathematical problem-solving capabilities, especially when dealing with problems of varying difficulty. Its fine-tuning approach makes it a strong candidate for tasks where other models might struggle with more complex or less common mathematical challenges.