Name: nvidia/CUDA-Autocomplete API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: nvidia

Model Overview

NVIDIA CUDA-Autocomplete is a 7.6 billion parameter language model, fine-tuned from Qwen/Qwen2.5-Coder-7B, with a context length of 32768 tokens. Its primary function is to provide intelligent code completion, specifically optimized for CUDA programming environments. The model takes both the code before the cursor (prefix) and after the cursor (suffix) as input to generate relevant code suggestions.

Key Capabilities

CUDA-Optimized Code Completion: Specialized in generating accurate and contextually relevant code for CUDA development, alongside general programming.
Fill-in-the-Middle (FIM) Input: Utilizes both prefix and suffix code context for more precise suggestions.
Integration: Designed for seamless integration with the Nsight Copilot extension for VSCode and Cursor.
Commercial Use: Licensed for both commercial and non-commercial applications under the NVIDIA Open Model License Agreement.

Training and Architecture

The model is built on a Transformer architecture (Qwen2ForCausalLM) and was trained on a diverse dataset including a subset of bigcode/the-stack-v2 and synthetically generated CUDA data. It is optimized to run efficiently on NVIDIA GPU-accelerated systems, leveraging hardware like H100 and DGX Spark for faster inference times.

Overview

Model Overview

Key Capabilities

Training and Architecture

Full Model Card (README)