hkust-nlp/drkernel-8b-coldstart
The hkust-nlp/drkernel-8b-coldstart is an 8 billion parameter Qwen3-based causal language model developed by hkust-nlp. This model serves as the cold-start supervised fine-tuning (SFT) checkpoint for the DR.Kernel project, specifically designed for generating and refining structured kernel-optimization responses using the DR.Kernel prompt format. It is primarily intended as an initialization point for further reinforcement learning (RL) training stages, providing a strong SFT baseline for kernel generation tasks.
Loading preview...
DR.Kernel-8B-ColdStart Overview
The hkust-nlp/drkernel-8b-coldstart is an 8 billion parameter model based on the Qwen3 architecture, developed by hkust-nlp. It represents the cold-start supervised fine-tuning (SFT) checkpoint for the DR.Kernel project, focusing on generating structured kernel-optimization responses. This model is trained exclusively on multi-turn SFT data from the hkust-nlp/drkernel-coldstart-8k dataset, which teaches kernel-generation and refinement behaviors.
Key Capabilities & Purpose
- Structured Kernel Optimization: Specializes in producing optimized kernel code, particularly for Triton kernels, by transforming existing PyTorch operators.
- Initialization for RL: Designed as the foundational checkpoint for subsequent reinforcement learning (RL) stages (TRLOO, MRS, PR, PRS) within the DR.Kernel framework.
- Strong SFT Baseline: Provides a robust supervised fine-tuning baseline for tasks involving kernel generation and optimization.
- Ablation Studies: Useful for researchers conducting ablations to compare performance between cold-start and post-RL checkpoints.
Intended Use Cases
- RL Training Initialization: The primary use is to serve as the starting point for DR.Kernel's reinforcement learning training.
- Kernel Generation Baseline: Can be used as a strong SFT model for generating optimized Triton kernels.
- Research & Development: Ideal for experimental setups and comparative analysis in kernel optimization research.
This model does not include the final performance claims of the full DR.Kernel RL results and is not intended for safety-critical production deployment without further verification.