AscendKernelGen/KernelGen-LM-32B
AscendKernelGen/KernelGen-LM-32B is a domain-adaptive large language model built upon the Qwen3-32B backbone, specialized for generating low-level NPU kernels for Huawei Ascend architecture using AscendC. Trained on the Ascend-CoT dataset and refined with reinforcement learning, it achieves a 96.5% compilation success rate on complex L2 tasks. This model excels at generating functional hardware kernels, significantly outperforming general-purpose models in hardware-specific code generation.
Loading preview...
Overview
KernelGen-LM-32B is a specialized large language model designed for generating low-level NPU kernels for the Huawei Ascend architecture, utilizing the AscendC programming language. Built on the Qwen3-32B backbone, this model is a product of the AscendKernelGen (AKGen) framework, which focuses on bridging the gap between general code generation and hardware-specific programming through a closed-loop system.
Key Capabilities & Innovations
- Domain-Specific Training: Leverages the Ascend-CoT Dataset, a high-quality, domain-specific dataset incorporating Chain-of-Thought (CoT) reasoning for NPU programming.
- Two-Stage Optimization: Undergoes Supervised Fine-Tuning (SFT) to correct API misuse and numerical errors, followed by Reinforcement Learning (RL) using Direct Preference Optimization (DPO) with execution-based correctness and performance signals.
- Hardware-Grounded Evaluation: Validated using NPUKernelBench, a comprehensive benchmark assessing compilation success, functional correctness, and performance on real Ascend hardware.
- Unprecedented Success Rates: Achieves a 96.5% compilation success rate (Pass@10) on complex Level-2 tasks, a significant improvement over baselines that fail completely.
- Enhanced Code Quality: Demonstrates expert-level reasoning and accurate implementation for complex AscendC instructions and tiling strategies, as shown in case studies involving
Mulsinstruction usage andSwishoperator implementation.
Use Cases
This model is ideal for developers and researchers focused on:
- Automated generation of highly optimized NPU kernels for Huawei Ascend devices.
- Accelerating the development of hardware-specific code in AscendC.
- Research into domain-adaptive LLMs for specialized hardware programming.