AscendKernelGen/KernelGen-LM-14B

TEXT GENERATIONConcurrency Cost:1Model Size:14BQuant:FP8Ctx Length:32kPublished:Jan 23, 2026License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

AscendKernelGen/KernelGen-LM-14B is a domain-adaptive large language model built on the Qwen3-14B backbone, specialized for low-level NPU kernel generation for the Huawei Ascend architecture using AscendC. It is trained on the Ascend-CoT dataset and refined with reinforcement learning using execution feedback. This model excels at generating hardware-specific code, demonstrating significant improvements on complex Level-2 kernels where general-purpose models fail. Its primary use is to bridge the gap between general code generation and hardware-specific NPU programming.

Loading preview...

KernelGen-LM-14B: Specialized NPU Kernel Generation

KernelGen-LM-14B, developed by AscendKernelGen, is a domain-adaptive large language model built upon the Qwen3-14B backbone. It is specifically designed for generating low-level NPU kernels for the Huawei Ascend architecture using the AscendC programming language. The model leverages a unique closed-loop system for data construction, training, and evaluation, making it highly specialized for hardware-specific programming.

Key Innovations & Capabilities

  • Ascend-CoT Dataset: Trained on a high-quality, domain-specific dataset incorporating Chain-of-Thought (CoT) reasoning, which combines documentation-based, code-centric, and general reasoning chains to capture the structured logic of NPU programming.
  • Domain-Adaptive Post-Training: Utilizes a two-stage optimization process involving Supervised Fine-Tuning (SFT) with error-derived supervision and Reinforcement Learning (RL) using Direct Preference Optimization (DPO), driven by execution-based correctness and performance signals.
  • Hardware-Grounded Evaluation: Validated using NPUKernelBench, a comprehensive benchmark assessing compilation success, functional correctness, and latency on real Ascend hardware.
  • Enhanced Performance: Demonstrates significant improvement on complex Level-2 kernels compared to general-purpose models like Qwen3 and Llama3.1, which often fail completely on such tasks.

Use Cases & Strengths

KernelGen-LM-14B excels at generating accurate and efficient AscendC code for NPU kernels. Its structured reasoning, as shown in case studies, allows it to comprehend and implement complex instructions and tiling strategies, leading to correct and optimized code. This model is ideal for developers working on Huawei Ascend NPUs who require automated, high-quality kernel generation, bridging the gap between high-level AI development and low-level hardware optimization.