Overview
MAmmoTH2-8B: Enhanced Reasoning through Web-Scale Instruction Tuning
MAmmoTH2-8B, developed by TIGER-Lab, is an 8 billion parameter model built upon the Llama-3 architecture, designed to significantly improve reasoning capabilities in large language models. This is achieved through an innovative instruction tuning approach that efficiently extracts 10 million high-quality instruction-response pairs from the pre-training web corpus.
Key Capabilities and Features
- Enhanced Reasoning: Demonstrates substantial performance gains on reasoning benchmarks, including a notable improvement on MATH and GSM8K scores.
- Cost-Effective Training: Utilizes a novel method for acquiring large-scale, high-quality instruction data, offering an efficient alternative to traditional domain-specific training.
- Broad Applicability: The base MAmmoTH2 models improve reasoning without training on domain-specific data, while MAmmoTH2-Plus variants further enhance performance across reasoning and chatbot benchmarks by incorporating public instruction tuning datasets.
- Strong Benchmarks: Achieves competitive results on various evaluation datasets, including TheoremQA (30.3%), MATH (35.8%), GSM8K (70.4%), and MMLU-ST (64.2%).
Ideal Use Cases
- Mathematical Problem Solving: Excels at open-ended and multiple-choice math problems, making it suitable for applications requiring strong quantitative reasoning.
- General Reasoning Tasks: Applicable to a wide range of tasks benefiting from improved logical and analytical processing.
- Research and Development: Provides a robust foundation for further research into instruction tuning and reasoning enhancement in LLMs.