Name: nvidia/EGM-4B-SFT API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: nvidia

Overview

nvidia/EGM-4B-SFT is a 4 billion parameter supervised fine-tuning (SFT) checkpoint, part of NVIDIA's Efficient Visual Grounding Language Models (EGM) project. This model is an intermediate checkpoint derived from Qwen3-VL-4B-Thinking and is specifically designed to serve as an initialization for subsequent reinforcement learning (RL) training. Its primary purpose is to learn structured visual grounding with explicit reasoning.

Key Capabilities & Training

This SFT model is trained by fine-tuning the base Qwen3-VL-4B-Thinking model on reasoning-augmented visual grounding data. This data includes detailed chain-of-thought reasoning steps generated by a proprietary Visual Language Model (VLM), enabling the model to develop explicit reasoning capabilities for visual grounding tasks. The architecture is based on Qwen3VLForConditionalGeneration and utilizes bfloat16 precision.

Intended Use

nvidia/EGM-4B-SFT is not intended for direct end-user deployment as a final model. Instead, it is specifically provided for developers and researchers who wish to perform further reinforcement learning training. For the best performing, final EGM model, users should refer to nvidia/EGM-4B, which is the result of the subsequent RL stage (GRPO) initialized with this SFT checkpoint.

Overview

Overview

Key Capabilities & Training

Intended Use

Full Model Card (README)