Name: hkust-nlp/drkernel-8b API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: hkust-nlp

DR.Kernel-8B: Specialized for GPU Kernel Optimization

hkust-nlp/drkernel-8b is an 8 billion parameter model built upon the Qwen3-8B architecture, uniquely specialized for generating and optimizing GPU kernels, particularly those using Triton. Developed by hkust-nlp, this model stands out by being trained for iterative optimization with execution feedback from KernelGYM, rather than just single-shot code generation.

Key Capabilities

Triton Kernel Generation: Creates optimized Triton kernel implementations from PyTorch reference architectures.
Iterative Refinement: Designed for multi-turn optimization, leveraging execution feedback to iteratively improve kernel performance and correctness.
Agentic Code Refinement: Supports agentic workflows for code optimization under execution-based reward signals.
Qwen3-8B Base: Benefits from the strong foundational capabilities of the Qwen3-8B model family.

Good for

Kernel Generation Research: Ideal for academic and industrial research into automated GPU kernel development.
Triton Kernel Optimization: Specifically suited for tasks requiring the generation and fine-tuning of high-performance Triton kernels.
Benchmarking: Useful for evaluating kernel generation and optimization techniques using frameworks like KernelGYM and KernelBench.
Multi-turn Code Agents: Applicable in scenarios where code needs to be iteratively improved based on compilation, correctness, and performance feedback.

This model is trained using a two-stage DR.Kernel pipeline, involving cold-start SFT on hkust-nlp/drkernel-coldstart-8k and multi-turn RL with TRLOO + MRS + PR + PRS methods, utilizing hkust-nlp/drkernel-rl-data and validated on hkust-nlp/drkernel-validation-data.

Overview

DR.Kernel-8B: Specialized for GPU Kernel Optimization

Key Capabilities

Good for

Full Model Card (README)