Name: AscendKernelGen/KernelGen-LM-32B-RL API
Brand: Featherless.ai
Price: 25.00 USD
Availability: InStock
Author: AscendKernelGen

Overview

AscendKernelGen/KernelGen-LM-32B-RL is a specialized large language model designed for generating low-level Neural Processing Unit (NPU) kernels, specifically for the Huawei Ascend architecture using the AscendC programming language. Built upon the Qwen3-32B backbone, this model is a product of the AscendKernelGen (AKGen) framework, which focuses on bridging the gap between general-purpose code generation and hardware-specific programming.

Key Capabilities & Innovations

Domain-Specific Training: Utilizes the high-quality, domain-specific Ascend-CoT dataset, which incorporates Chain-of-Thought (CoT) reasoning from documentation, real-world kernel implementations, and general reasoning chains.
Two-Stage Optimization: Employs a unique two-stage post-training process: Supervised Fine-Tuning (SFT) with error-derived supervision, followed by Reinforcement Learning (RL) using Direct Preference Optimization (DPO), driven by execution-based correctness and performance signals.
Hardware-Grounded Evaluation: Validated using NPUKernelBench, a comprehensive benchmark assessing compilation success, functional correctness, and latency on real Ascend hardware.
Unprecedented Performance: Achieves significant improvements on complex Level-2 kernels, with compilation success rates up to 95.5% (Pass@10) and functional correctness of 64.3%, effectively solving tasks where general-purpose models like Qwen3 and Llama3.1 fail completely.

Use Cases

This model is ideal for developers and researchers working on:

Automated generation of highly optimized NPU kernels for Huawei Ascend processors.
Accelerating the development of hardware-specific code in AscendC.
Tasks requiring precise, functional, and performant low-level hardware programming.

Overview

Overview

Key Capabilities & Innovations

Use Cases

Full Model Card (README)