AceInstruct-72B: Versatile Instruction-Tuned Model
AceInstruct-72B is a 72.7 billion parameter instruction-tuned model developed by NVIDIA, built upon the Qwen2.5-Base architecture. It is part of the AceInstruct family, which also includes 1.5B and 7B parameter variants, all fine-tuned using general SFT datasets, including those used for AceMath-Instruct. Unlike the math-specialized AceMath models, AceInstruct-72B is designed for broad applicability across various domains.
Key Capabilities & Performance
AceInstruct-72B demonstrates strong performance across coding, mathematics, and general knowledge tasks, often matching or slightly surpassing its Qwen2.5-72B-Instruct counterpart. Notable benchmark results include:
- HumanEval (Coding): 89.63
- GSM8K (Math): 96.36
- MATH (Math): 84.50
- MMLU (General Knowledge): 83.88
This model is particularly versatile, making it suitable for a wide array of instruction-following tasks. It utilizes a substantial 131072 token context length, enabling processing of extensive inputs.
Training and Resources
The model was fine-tuned on Qwen2.5-Base using the AceMath-Instruct-Training-Data and other general SFT datasets. For more detailed information, refer to the NVIDIA research website and the associated paper.