Name: nvidia/Llama-3_3-Nemotron-Super-49B-v1_5 API
Brand: Featherless.ai
Price: 25.00 USD
Availability: InStock
Author: nvidia

Model Overview

nvidia/Llama-3_3-Nemotron-Super-49B-v1_5 is a 49 billion parameter large language model developed by NVIDIA, derived from Meta's Llama-3.3-70B-Instruct. It features a significantly upgraded architecture achieved through a novel Neural Architecture Search (NAS) approach, which optimizes for an accuracy-efficiency trade-off, enabling larger workloads and fitting on a single H200 GPU. This model supports an extended context length of 128K tokens.

Key Capabilities

Enhanced Reasoning & Agentic Tasks: Post-trained for superior reasoning, human chat preferences, and agentic tasks such as RAG and tool calling. This includes supervised fine-tuning for Math, Code, Science, and Tool Calling.
Multi-stage Reinforcement Learning: Underwent multiple stages of RL, including Reward-aware Preference Optimization (RPO) for chat, Reinforcement Learning with Verifiable Rewards (RLVR) for reasoning, and iterative Direct Preference Optimization (DPO) for tool calling.
Efficiency Optimization: Utilizes a NAS approach to reduce memory footprint and improve throughput, making it efficient for high workloads.
Multilingual Support: Primarily designed for English and coding languages, with additional support for German, French, Italian, Portuguese, Hindi, Spanish, and Thai.

Good For

Developers designing AI Agent systems.
Building advanced chatbots and conversational AI.
Implementing RAG (Retrieval Augmented Generation) systems.
Applications requiring robust tool calling capabilities.
General instruction-following tasks where reasoning is critical.

Overview

Model Overview

Key Capabilities

Good For

Full Model Card (README)