hkust-nlp/drkernel-14b-coldstart

TEXT GENERATIONConcurrency Cost:1Model Size:14BQuant:FP8Ctx Length:32kPublished:Feb 5, 2026Architecture:Transformer Cold

The hkust-nlp/drkernel-14b-coldstart is a 14 billion parameter Qwen3-based causal language model developed by hkust-nlp. This cold-start supervised fine-tuning (SFT) checkpoint is specifically designed for generating and refining structured kernel-optimization responses, aiming to replace PyTorch operators with custom Triton kernels. It serves as an initialization point for further reinforcement learning (RL) training stages, excelling as a strong SFT baseline for kernel generation tasks.

Loading preview...

DR.Kernel-14B-ColdStart Overview

The hkust-nlp/drkernel-14b-coldstart is a 14 billion parameter model based on the Qwen3 architecture, developed by hkust-nlp. This specific release represents the cold-start supervised fine-tuning (SFT) checkpoint for the DR.Kernel project. It has been trained exclusively on multi-turn SFT data, specifically the hkust-nlp/drkernel-coldstart-8k dataset, to teach the model kernel-generation and refinement behaviors.

Key Capabilities

  • Structured Kernel Optimization: Specializes in generating and refining code to replace standard PyTorch operators with custom Triton kernels, aiming for performance optimization.
  • Initialization for RL: Primarily intended as an initialization checkpoint for subsequent reinforcement learning (RL) stages, including TRLOO, MRS, PR, and PRS.
  • Strong SFT Baseline: Functions as a robust baseline for kernel generation tasks, useful for ablations comparing cold-start versus post-RL checkpoints.

Intended Use Cases

  • RL Training Initialization: Use this model as the starting point for further RL training to develop the full DR.Kernel model.
  • Kernel Generation Baseline: Employ it as a strong SFT baseline for tasks involving the generation of optimized Triton kernels.
  • Ablation Studies: Ideal for research and development to compare the performance and characteristics of a cold-start SFT model against models that have undergone additional RL training.