AscendKernelGen/KernelGen-LM-4B
KernelGen-LM-4B is a domain-adaptive large language model developed by AscendKernelGen, specialized for generating low-level NPU kernels for the Huawei Ascend architecture using AscendC. Built on the Qwen3-4B backbone, it is trained on the Ascend-CoT dataset and refined with reinforcement learning using execution feedback. This model excels at hardware-specific code generation, demonstrating significant improvements in complex kernel implementation compared to general-purpose LLMs. Its primary use is to automate and optimize the creation of NPU kernels, bridging the gap between high-level code generation and hardware-specific programming.
Loading preview...
Overview
AscendKernelGen/KernelGen-LM-4B is a specialized large language model designed for generating highly optimized low-level NPU (Neural Processing Unit) kernels for Huawei Ascend hardware, utilizing the AscendC programming language. Developed by AscendKernelGen, this model is built upon the Qwen3-4B architecture and has undergone extensive domain-adaptive post-training.
Key Capabilities
- Domain-Specific Code Generation: Excels at producing correct and efficient AscendC code for NPU kernels, a task where general-purpose LLMs often fail.
- Chain-of-Thought (CoT) Reasoning: Leverages the proprietary Ascend-CoT dataset, which incorporates documentation-based, code-centric, and general reasoning chains to understand complex NPU programming logic.
- Reinforcement Learning with Execution Feedback: Utilizes Supervised Fine-Tuning (SFT) followed by Direct Preference Optimization (DPO) based on execution-driven correctness and performance signals, ensuring generated code is both functional and performant.
- Hardware-Grounded Evaluation: Validated using NPUKernelBench, a comprehensive benchmark assessing compilation, functional correctness, and latency on real Ascend hardware.
- Improved Complex Kernel Handling: Demonstrates significant qualitative and quantitative improvements in generating complex Level-2 kernels and handling intricate tiling strategies compared to baseline models.
Good For
- Automating NPU Kernel Development: Ideal for developers and researchers working on Huawei Ascend platforms who need to generate optimized low-level kernels.
- Bridging Hardware-Software Gaps: Useful for tasks requiring precise, hardware-specific code generation that general LLMs cannot achieve.
- Research in Domain-Adaptive LLMs: Provides a strong example and framework for developing LLMs specialized in highly technical and hardware-specific domains.