AscendKernelGen/KernelGen-LM-32B-RL
KernelGen-LM-32B-RL by AscendKernelGen is a 32-billion parameter domain-adaptive large language model built on the Qwen3-32B backbone, specialized for generating low-level NPU kernels for the Huawei Ascend architecture using AscendC. It is trained on the Ascend-CoT dataset and refined with reinforcement learning using execution feedback. This model achieves high success rates in generating complex, functional hardware kernels, significantly improving compilation success and functional correctness for NPU programming tasks.
Loading preview...
Overview
AscendKernelGen/KernelGen-LM-32B-RL is a specialized large language model designed for generating low-level Neural Processing Unit (NPU) kernels, specifically for the Huawei Ascend architecture using the AscendC programming language. Built upon the Qwen3-32B backbone, this model is a product of the AscendKernelGen (AKGen) framework, which focuses on bridging the gap between general-purpose code generation and hardware-specific programming.
Key Capabilities & Innovations
- Domain-Specific Training: Utilizes the high-quality, domain-specific Ascend-CoT dataset, which incorporates Chain-of-Thought (CoT) reasoning from documentation, real-world kernel implementations, and general reasoning chains.
- Two-Stage Optimization: Employs a unique two-stage post-training process: Supervised Fine-Tuning (SFT) with error-derived supervision, followed by Reinforcement Learning (RL) using Direct Preference Optimization (DPO), driven by execution-based correctness and performance signals.
- Hardware-Grounded Evaluation: Validated using NPUKernelBench, a comprehensive benchmark assessing compilation success, functional correctness, and latency on real Ascend hardware.
- Unprecedented Performance: Achieves significant improvements on complex Level-2 kernels, with compilation success rates up to 95.5% (Pass@10) and functional correctness of 64.3%, effectively solving tasks where general-purpose models like Qwen3 and Llama3.1 fail completely.
Use Cases
This model is ideal for developers and researchers working on:
- Automated generation of highly optimized NPU kernels for Huawei Ascend processors.
- Accelerating the development of hardware-specific code in AscendC.
- Tasks requiring precise, functional, and performant low-level hardware programming.