FriendliAI/Llama-3_3-Nemotron-Super-49B-v1_5

TEXT GENERATIONConcurrency Cost:3Model Size:49BQuant:FP8Ctx Length:32kPublished:Oct 15, 2025License:nvidia-open-model-licenseArchitecture:Transformer Open Weights Cold

FriendliAI/Llama-3_3-Nemotron-Super-49B-v1_5 is a 49 billion parameter large language model developed by NVIDIA, derived from Meta's Llama-3.3-70B-Instruct. It is specifically post-trained for enhanced reasoning, human chat preferences, and agentic tasks like RAG and tool calling, supporting an extended context length of 128K tokens. This model utilizes a novel Neural Architecture Search (NAS) approach to optimize for accuracy-efficiency trade-offs, enabling high performance with reduced memory footprint. It excels in general reasoning, math, code, and instruction-following tasks, making it suitable for AI agent systems and chatbots.

Loading preview...

Model Overview

FriendliAI/Llama-3_3-Nemotron-Super-49B-v1_5 is a 49 billion parameter large language model developed by NVIDIA, building upon Meta's Llama-3.3-70B-Instruct. It features a significantly extended context length of 128K tokens, making it suitable for complex, long-form interactions.

Key Capabilities

  • Enhanced Reasoning: Post-trained with Reinforcement Learning with Verifiable Rewards (RLVR) to improve reasoning capabilities across various domains.
  • Agentic Tasks: Optimized for human chat preferences, RAG (Retrieval Augmented Generation), and advanced tool calling through iterative Direct Preference Optimization (DPO).
  • Efficiency: Leverages a novel Neural Architecture Search (NAS) approach, detailed in this paper, to achieve an optimal balance between accuracy and computational efficiency, allowing for larger workloads and deployment on single H200 GPUs.
  • Multilingual Support: Primarily designed for English and coding languages, with additional support for German, French, Italian, Portuguese, Hindi, Spanish, and Thai.
  • Robust Training: Underwent multi-phase post-training including supervised fine-tuning for Math, Code, Science, and Tool Calling, and multiple stages of Reinforcement Learning.

Performance Highlights

The model demonstrates strong performance across various benchmarks in "Reasoning On" mode, including:

  • MATH500: 97.4 pass@1
  • AIME 2024: 87.5 pass@1
  • LiveCodeBench 24.10-25.02: 73.58 pass@1
  • MMLU Pro (CoT): 79.53 pass@1

Good For

  • Developers designing AI Agent systems requiring advanced reasoning and tool-use.
  • Building sophisticated chatbots and RAG systems that demand high accuracy and efficiency.
  • Applications requiring strong instruction-following capabilities in English and coding languages.
  • Use cases where cost-efficiency and single-GPU deployment for large models are critical.