nvidia/Llama-3_3-Nemotron-Super-49B-v1_5
The nvidia/Llama-3_3-Nemotron-Super-49B-v1_5 is a 49 billion parameter large language model developed by NVIDIA, derived from Meta's Llama-3.3-70B-Instruct. It is specifically post-trained using a novel Neural Architecture Search (NAS) approach to optimize for reasoning, human chat preferences, and agentic tasks like RAG and tool calling, while balancing accuracy and efficiency. Supporting a 128K token context length, this model is designed for developers building AI agent systems, chatbots, and RAG applications, offering enhanced performance on a single GPU.
Loading preview...
Model Overview
nvidia/Llama-3_3-Nemotron-Super-49B-v1_5 is a 49 billion parameter large language model developed by NVIDIA, derived from Meta's Llama-3.3-70B-Instruct. It features a significantly upgraded architecture achieved through a novel Neural Architecture Search (NAS) approach, which optimizes for an accuracy-efficiency trade-off, enabling larger workloads and fitting on a single H200 GPU. This model supports an extended context length of 128K tokens.
Key Capabilities
- Enhanced Reasoning & Agentic Tasks: Post-trained for superior reasoning, human chat preferences, and agentic tasks such as RAG and tool calling. This includes supervised fine-tuning for Math, Code, Science, and Tool Calling.
- Multi-stage Reinforcement Learning: Underwent multiple stages of RL, including Reward-aware Preference Optimization (RPO) for chat, Reinforcement Learning with Verifiable Rewards (RLVR) for reasoning, and iterative Direct Preference Optimization (DPO) for tool calling.
- Efficiency Optimization: Utilizes a NAS approach to reduce memory footprint and improve throughput, making it efficient for high workloads.
- Multilingual Support: Primarily designed for English and coding languages, with additional support for German, French, Italian, Portuguese, Hindi, Spanish, and Thai.
Good For
- Developers designing AI Agent systems.
- Building advanced chatbots and conversational AI.
- Implementing RAG (Retrieval Augmented Generation) systems.
- Applications requiring robust tool calling capabilities.
- General instruction-following tasks where reasoning is critical.