Name: unsloth/Llama-3_3-Nemotron-Super-49B-v1_5 API
Brand: Featherless.ai
Price: 25.00 USD
Availability: InStock
Author: unsloth

Model Overview

Llama-3.3-Nemotron-Super-49B-v1.5 is a 49 billion parameter large language model developed by NVIDIA, building upon Meta's Llama-3.3-70B-Instruct. This model is specifically post-trained to enhance reasoning capabilities, align with human chat preferences, and perform agentic tasks such as Retrieval Augmented Generation (RAG) and tool calling. It supports an extended context length of 128K tokens.

Key Differentiators

Efficiency and Accuracy Trade-off: Employs a novel Neural Architecture Search (NAS) approach, detailed in this paper, to significantly reduce memory footprint and enable deployment on a single GPU (e.g., H200) while maintaining high accuracy.
Multi-phase Post-training: Underwent extensive supervised fine-tuning for Math, Code, Science, and Tool Calling. It also incorporates multiple stages of Reinforcement Learning (RL), including Reward-aware Preference Optimization (RPO) for chat and Reinforcement Learning with Verifiable Rewards (RLVR) for reasoning, alongside iterative Direct Preference Optimization (DPO) for tool calling.
Optimized Architecture: Utilizes a customized Llama 3.3 70B Instruct network architecture with non-standard and non-repetitive blocks, including skip attention and variable FFN ratios, derived from NAS and block-wise distillation.

Intended Use Cases

AI Agent Systems: Ideal for developers building sophisticated AI agents.
Chatbots: Suitable for creating advanced conversational AI applications.
RAG Systems: Designed to perform well in Retrieval Augmented Generation scenarios.
Tool Calling: Enhanced capabilities for integrating with external tools and functions.
Instruction Following: General-purpose instruction-following tasks in English and coding languages, with support for several other non-English languages.

Overview

Model Overview

Key Differentiators

Intended Use Cases

Full Model Card (README)