Name: nvidia/EGM-8B-SFT API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: nvidia

Overview

nvidia/EGM-8B-SFT is an 8 billion parameter supervised fine-tuning (SFT) checkpoint developed by NVIDIA as part of their Efficient Visual Grounding Language Models (EGM) initiative. It is based on the Qwen3-VL-8B-Thinking architecture and features a 32,768 token context length. This model is an intermediate checkpoint intended for further reinforcement learning (RL) training, with the final, best-performing model available as nvidia/EGM-8B.

Key Capabilities & Training

Visual Grounding: Specifically fine-tuned for visual grounding tasks, learning to associate language with visual elements.
Reasoning-Augmented Data: Trained on proprietary VLM-generated chain-of-thought reasoning steps to enable structured visual grounding with explicit reasoning.
SFT Stage: Represents the supervised fine-tuning stage, serving as the foundational model before the subsequent GRPO (RL) stage.
Architecture: Utilizes a Qwen3VLForConditionalGeneration architecture with bfloat16 precision, 36 text layers, and 27 vision layers.

Intended Use

RL Training Initialization: Primarily designed as an initialization checkpoint for developers to conduct their own reinforcement learning training using the EGM framework.
Research & Development: Suitable for researchers exploring visual grounding, multi-modal models, and RL-based fine-tuning techniques.

Overview

Overview

Key Capabilities & Training

Intended Use

Full Model Card (README)