jwkirchenbauer/L3-1-8B-Magpie-MTP
TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Feb 10, 2026Architecture:Transformer Cold

The jwkirchenbauer/L3-1-8B-Magpie-MTP is an 8 billion parameter language model with a 32768 token context length, developed by jwkirchenbauer. This model is uniquely trained with a Multi-Token Prediction (MTP) objective, allowing it to predict multiple future tokens in a single forward pass. It features a custom generation API designed for accelerated decoding, making it particularly efficient for inference tasks where speed is critical.

Loading preview...

Overview

The jwkirchenbauer/L3-1-8B-Magpie-MTP is an 8 billion parameter language model that introduces a novel Multi-Token Prediction (MTP) objective. Unlike standard autoregressive models that generate one token at a time, this model can predict multiple future tokens (k) in a single forward pass, significantly accelerating inference.

Key Capabilities

  • Accelerated Inference: Utilizes a custom generate() implementation to predict k tokens simultaneously, bypassing the need for auxiliary draft models.
  • Adaptive Decoding: Features an adaptive mode (ConfAdapt) that dynamically adjusts the number of predicted tokens based on the model's confidence, balancing speed and accuracy.
  • Custom Generation API: Requires trust_remote_code=True to enable its specialized generation logic, offering flexible control over decoding strategies.
  • Configurable Strategies: Supports fixed-K generation for consistent acceleration and adaptive strategies like conf_adapt for nearly lossless, variable acceleration.

Usage Notes

To leverage MTP, users must pass do_mtp=True to the generate() function and specify the correct mask_id and eos_id for the model. The MTP generation currently supports single-example generation only, without batching.