Name: AscendKernelGen/KernelGen-LM-4B API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: AscendKernelGen

Overview

AscendKernelGen/KernelGen-LM-4B is a specialized large language model designed for generating highly optimized low-level NPU (Neural Processing Unit) kernels for Huawei Ascend hardware, utilizing the AscendC programming language. Developed by AscendKernelGen, this model is built upon the Qwen3-4B architecture and has undergone extensive domain-adaptive post-training.

Key Capabilities

Domain-Specific Code Generation: Excels at producing correct and efficient AscendC code for NPU kernels, a task where general-purpose LLMs often fail.
Chain-of-Thought (CoT) Reasoning: Leverages the proprietary Ascend-CoT dataset, which incorporates documentation-based, code-centric, and general reasoning chains to understand complex NPU programming logic.
Reinforcement Learning with Execution Feedback: Utilizes Supervised Fine-Tuning (SFT) followed by Direct Preference Optimization (DPO) based on execution-driven correctness and performance signals, ensuring generated code is both functional and performant.
Hardware-Grounded Evaluation: Validated using NPUKernelBench, a comprehensive benchmark assessing compilation, functional correctness, and latency on real Ascend hardware.
Improved Complex Kernel Handling: Demonstrates significant qualitative and quantitative improvements in generating complex Level-2 kernels and handling intricate tiling strategies compared to baseline models.

Good For

Automating NPU Kernel Development: Ideal for developers and researchers working on Huawei Ascend platforms who need to generate optimized low-level kernels.
Bridging Hardware-Software Gaps: Useful for tasks requiring precise, hardware-specific code generation that general LLMs cannot achieve.
Research in Domain-Adaptive LLMs: Provides a strong example and framework for developing LLMs specialized in highly technical and hardware-specific domains.

Overview

Overview

Key Capabilities

Good For

Full Model Card (README)