celestialcreator/Llama-3.2-1B-MTP-k8
TEXT GENERATIONConcurrency Cost:1Model Size:1BQuant:BF16Ctx Length:32kPublished:Mar 5, 2026License:llama3.2Architecture:Transformer0.0K Warm

celestialcreator/Llama-3.2-1B-MTP-k8 is a 1 billion parameter Llama-3.2 model adapted for Multi-Token Prediction (MTP) using online self-distillation, based on the arXiv paper 2602.06019. This model is specifically trained to predict multiple future tokens simultaneously, enabling faster inference throughput with minimal quality degradation. It features a 32768 token context length and is optimized for efficient generation on consumer hardware by utilizing ConfAdapt decoding.

Loading preview...