nvidia/AceInstruct-72B

Warm
Public
72.7B
FP8
131072
Jan 15, 2025
License: cc-by-nc-4.0
Hugging Face
Overview

AceInstruct-72B: Versatile Instruction-Tuned Model

AceInstruct-72B is a 72.7 billion parameter instruction-tuned model developed by NVIDIA, built upon the Qwen2.5-Base architecture. It is part of the AceInstruct family, which also includes 1.5B and 7B parameter variants, all fine-tuned using general SFT datasets, including those used for AceMath-Instruct. Unlike the math-specialized AceMath models, AceInstruct-72B is designed for broad applicability across various domains.

Key Capabilities & Performance

AceInstruct-72B demonstrates strong performance across coding, mathematics, and general knowledge tasks, often matching or slightly surpassing its Qwen2.5-72B-Instruct counterpart. Notable benchmark results include:

  • HumanEval (Coding): 89.63
  • GSM8K (Math): 96.36
  • MATH (Math): 84.50
  • MMLU (General Knowledge): 83.88

This model is particularly versatile, making it suitable for a wide array of instruction-following tasks. It utilizes a substantial 131072 token context length, enabling processing of extensive inputs.

Training and Resources

The model was fine-tuned on Qwen2.5-Base using the AceMath-Instruct-Training-Data and other general SFT datasets. For more detailed information, refer to the NVIDIA research website and the associated paper.