facebook/KernelLLM

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Apr 14, 2025License:otherArchitecture:Transformer0.2K Warm

facebook/KernelLLM is an 8 billion parameter large language model, based on Llama 3.1 Instruct, specifically fine-tuned by Meta for authoring GPU kernels using Triton. It translates PyTorch modules into efficient Triton kernel implementations, aiming to democratize GPU programming. The model demonstrates competitive or superior performance on kernel generation tasks compared to much larger models, as evaluated on KernelBench-Triton.

Loading preview...

KernelLLM: Specialized for GPU Kernel Generation

KernelLLM, developed by Meta, is an 8 billion parameter language model built upon Llama 3.1 Instruct, uniquely fine-tuned for generating GPU kernels using Triton. Its primary purpose is to translate PyTorch modules into optimized Triton kernel implementations, making high-performance GPU programming more accessible.

Key Capabilities & Differentiators

  • Specialized Kernel Generation: Trained on approximately 25,000 paired examples of PyTorch modules and their Triton kernel equivalents, along with synthetic data from the KernelBook dataset.
  • Performance: On KernelBench-Triton Level 1, KernelLLM's 8B parameter model achieves a score of 20.2 (pass@1) and 57.1 (pass@20), outperforming significantly larger models like GPT-4o (~200B parameters) and DeepSeek V3 (671B parameters) in single-shot performance.
  • Efficiency: Aims to automate the generation of efficient Triton implementations, addressing the growing demand for tailored kernel solutions in diverse accelerator architectures.
  • Workflow: Integrates into a workflow where it translates PyTorch code into Triton kernel candidates, which are then validated against unit tests to select the best implementation.

Intended Use Cases

  • GPU Programming: Ideal for developers and researchers looking to automate and optimize the creation of high-performance GPU kernels.
  • PyTorch to Triton Translation: Specifically designed for converting PyTorch modules into Triton code.
  • Commercial and Research: Intended for use in English and relevant programming languages (Python, Triton) for both commercial applications and academic research.

Limitations

  • May produce incorrect API references, syntax errors, and struggle with instruction following.
  • Generated code can structurally resemble compiler-generated output and may not always implement a meaningful kernel.
  • Common issues include variable naming, tensor shapes, type handling, and numerical precision.

Popular Sampler Settings

Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.

temperature
top_p
top_k
frequency_penalty
presence_penalty
repetition_penalty
min_p