nvidia/CUDA-Autocomplete
NVIDIA CUDA-Autocomplete is a 7.6 billion parameter Qwen2.5-Coder-7B fine-tuned model developed by NVIDIA, specifically enhanced for CUDA code completion. It processes code prefix and suffix context to predict the most likely next line of code. This model excels at providing intelligent autocomplete functionality for general programming and CUDA development, particularly within the Nsight Copilot extension for VSCode and Cursor.
Loading preview...
Model Overview
NVIDIA CUDA-Autocomplete is a 7.6 billion parameter language model, fine-tuned from Qwen/Qwen2.5-Coder-7B, with a context length of 32768 tokens. Its primary function is to provide intelligent code completion, specifically optimized for CUDA programming environments. The model takes both the code before the cursor (prefix) and after the cursor (suffix) as input to generate relevant code suggestions.
Key Capabilities
- CUDA-Optimized Code Completion: Specialized in generating accurate and contextually relevant code for CUDA development, alongside general programming.
- Fill-in-the-Middle (FIM) Input: Utilizes both prefix and suffix code context for more precise suggestions.
- Integration: Designed for seamless integration with the Nsight Copilot extension for VSCode and Cursor.
- Commercial Use: Licensed for both commercial and non-commercial applications under the NVIDIA Open Model License Agreement.
Training and Architecture
The model is built on a Transformer architecture (Qwen2ForCausalLM) and was trained on a diverse dataset including a subset of bigcode/the-stack-v2 and synthetically generated CUDA data. It is optimized to run efficiently on NVIDIA GPU-accelerated systems, leveraging hardware like H100 and DGX Spark for faster inference times.