nvidia/AceInstruct-72B

Hugging Face
TEXT GENERATIONConcurrency Cost:4Model Size:72.7BQuant:FP8Ctx Length:32kPublished:Jan 15, 2025License:cc-by-nc-4.0Architecture:Transformer0.0K Open Weights Warm

AceInstruct-72B is a 72.7 billion parameter instruction-tuned causal language model developed by NVIDIA, fine-tuned on Qwen2.5-Base. It is designed for versatile application across coding, mathematics, and general-purpose tasks, demonstrating performance comparable to Qwen2.5-72B-Instruct. The model leverages a 131072 token context length and is part of a family of models improved using Qwen, excelling in a broad range of domains.

Loading preview...

AceInstruct-72B: Versatile Instruction-Tuned Model

AceInstruct-72B is a 72.7 billion parameter instruction-tuned model developed by NVIDIA, built upon the Qwen2.5-Base architecture. It is part of the AceInstruct family, which also includes 1.5B and 7B parameter variants, all fine-tuned using general SFT datasets, including those used for AceMath-Instruct. Unlike the math-specialized AceMath models, AceInstruct-72B is designed for broad applicability across various domains.

Key Capabilities & Performance

AceInstruct-72B demonstrates strong performance across coding, mathematics, and general knowledge tasks, often matching or slightly surpassing its Qwen2.5-72B-Instruct counterpart. Notable benchmark results include:

  • HumanEval (Coding): 89.63
  • GSM8K (Math): 96.36
  • MATH (Math): 84.50
  • MMLU (General Knowledge): 83.88

This model is particularly versatile, making it suitable for a wide array of instruction-following tasks. It utilizes a substantial 131072 token context length, enabling processing of extensive inputs.

Training and Resources

The model was fine-tuned on Qwen2.5-Base using the AceMath-Instruct-Training-Data and other general SFT datasets. For more detailed information, refer to the NVIDIA research website and the associated paper.