AscendKernelGen/KernelGen-LM-8B
KernelGen-LM-8B by AscendKernelGen is a domain-adaptive large language model built on the Qwen3-8B backbone, specialized for low-level NPU kernel generation for the Huawei Ascend architecture using AscendC. It is trained on the Ascend-CoT dataset and refined with reinforcement learning using execution feedback. This model excels at generating hardware-specific code, demonstrating significant improvements on complex NPU kernel tasks where general-purpose models often fail. Its primary use is to bridge the gap between general code generation and hardware-specific programming for Ascend NPUs.
Loading preview...
AscendKernelGen/KernelGen-LM-8B Overview
AscendKernelGen/KernelGen-LM-8B is a specialized large language model developed by AscendKernelGen, based on the Qwen3-8B architecture. It is uniquely designed for generating low-level NPU kernels for the Huawei Ascend platform using the AscendC programming language. The model's development is part of the AscendKernelGen (AKGen) framework, which focuses on a closed-loop system for data construction, training, and evaluation to address hardware-specific programming challenges.
Key Innovations & Capabilities
- Domain-Specific Training: Utilizes the Ascend-CoT Dataset, a high-quality dataset incorporating Chain-of-Thought (CoT) reasoning derived from documentation, real-world kernel implementations, and general reasoning chains, specifically tailored for low-level NPU programming.
- Two-Stage Optimization: Employs a Domain-Adaptive Post-Training process involving Supervised Fine-Tuning (SFT) to correct API misuse and numerical errors, followed by Reinforcement Learning (RL) with Direct Preference Optimization (DPO), driven by execution-based correctness and performance signals.
- Hardware-Grounded Evaluation: Validated using NPUKernelBench, a comprehensive benchmark assessing compilation success, functional correctness, and latency on actual Ascend hardware.
- Enhanced Performance: Demonstrates significant qualitative and quantitative improvements in generating complex Level-2 kernels, effectively solving tasks where general-purpose models (like Qwen3, Llama3.1) completely fail.
Use Cases
- Automated NPU Kernel Generation: Ideal for developers and researchers working with Huawei Ascend NPUs who need to generate optimized, hardware-specific kernels.
- Bridging Code Generation Gaps: Addresses the challenge of translating high-level programming concepts into efficient, low-level hardware instructions for specialized accelerators.
- Research in Domain-Adaptive LLMs: Provides a strong baseline and framework for further research into LLMs specialized for hardware programming and domain-specific code generation.