nvidia/OpenCodeReasoning-Nemotron-14B

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:14.8BQuant:FP8Ctx Length:32kPublished:Apr 15, 2025License:apache-2.0Architecture:Transformer0.0K Open Weights Warm

OpenCodeReasoning-Nemotron-14B is a 14.8 billion parameter large language model developed by NVIDIA, derived from Qwen2.5-14B-Instruct. This model is specifically post-trained for reasoning in code generation tasks, supporting a context length of up to 32,768 tokens. It excels in competitive programming benchmarks like LiveCodeBench and CodeContest, making it suitable for advanced code-related reasoning applications.

Loading preview...

Overview

NVIDIA's OpenCodeReasoning-Nemotron-14B is a 14.8 billion parameter large language model, built upon the Qwen2.5-14B-Instruct architecture. Its core differentiator is its specialized post-training for code generation reasoning, making it highly effective for complex programming challenges. The model supports an extensive context window of up to 32,768 tokens, allowing it to process and generate longer, more intricate code solutions.

Key Capabilities

  • Advanced Code Reasoning: Specifically optimized to enhance reasoning capabilities for code generation, distinguishing it from general-purpose LLMs.
  • Competitive Programming Performance: Demonstrates strong performance on benchmarks such as LiveCodeBench and CodeContest, outperforming several other distilled 7B+ and 14B+ models in its category.
  • Large Context Window: With a 32K token context length, it can handle substantial codebases and detailed problem descriptions.
  • Commercial Use Ready: The model is available for both commercial and non-commercial applications under the Apache 2.0 license.

Training and Architecture

OpenCodeReasoning-Nemotron-14B was trained on the OpenCodeReasoning dataset, which comprises 736k samples of competitive programming questions and DeepSeek-R1 generated responses. It utilizes a dense decoder-only Transformer architecture, based on the Qwen-14B-Instruct network. The model is designed and optimized for NVIDIA GPU-accelerated systems, leveraging hardware like NVIDIA Ampere and Hopper architectures for efficient inference.