Name: AscendKernelGen/KernelGen-LM-32B API
Brand: Featherless.ai
Price: 25.00 USD
Availability: InStock
Author: AscendKernelGen

Overview

KernelGen-LM-32B is a specialized large language model designed for generating low-level NPU kernels for the Huawei Ascend architecture, utilizing the AscendC programming language. Built on the Qwen3-32B backbone, this model is a product of the AscendKernelGen (AKGen) framework, which focuses on bridging the gap between general code generation and hardware-specific programming through a closed-loop system.

Key Capabilities & Innovations

Domain-Specific Training: Leverages the Ascend-CoT Dataset, a high-quality, domain-specific dataset incorporating Chain-of-Thought (CoT) reasoning for NPU programming.
Two-Stage Optimization: Undergoes Supervised Fine-Tuning (SFT) to correct API misuse and numerical errors, followed by Reinforcement Learning (RL) using Direct Preference Optimization (DPO) with execution-based correctness and performance signals.
Hardware-Grounded Evaluation: Validated using NPUKernelBench, a comprehensive benchmark assessing compilation success, functional correctness, and performance on real Ascend hardware.
Unprecedented Success Rates: Achieves a 96.5% compilation success rate (Pass@10) on complex Level-2 tasks, a significant improvement over baselines that fail completely.
Enhanced Code Quality: Demonstrates expert-level reasoning and accurate implementation for complex AscendC instructions and tiling strategies, as shown in case studies involving Muls instruction usage and Swish operator implementation.

Use Cases

This model is ideal for developers and researchers focused on:

Automated generation of highly optimized NPU kernels for Huawei Ascend devices.
Accelerating the development of hardware-specific code in AscendC.
Research into domain-adaptive LLMs for specialized hardware programming.

Overview

Overview

Key Capabilities & Innovations

Use Cases

Full Model Card (README)