The jwkirchenbauer/L3-1-8B-Magpie-MTP is an 8 billion parameter language model with a 32768 token context length, developed by jwkirchenbauer. This model is uniquely trained with a Multi-Token Prediction (MTP) objective, allowing it to predict multiple future tokens in a single forward pass. It features a custom generation API designed for accelerated decoding, making it particularly efficient for inference tasks where speed is critical.
Loading preview...
Overview
The jwkirchenbauer/L3-1-8B-Magpie-MTP is an 8 billion parameter language model that introduces a novel Multi-Token Prediction (MTP) objective. Unlike standard autoregressive models that generate one token at a time, this model can predict multiple future tokens (k) in a single forward pass, significantly accelerating inference.
Key Capabilities
- Accelerated Inference: Utilizes a custom
generate()implementation to predictktokens simultaneously, bypassing the need for auxiliary draft models. - Adaptive Decoding: Features an adaptive mode (ConfAdapt) that dynamically adjusts the number of predicted tokens based on the model's confidence, balancing speed and accuracy.
- Custom Generation API: Requires
trust_remote_code=Trueto enable its specialized generation logic, offering flexible control over decoding strategies. - Configurable Strategies: Supports fixed-K generation for consistent acceleration and adaptive strategies like
conf_adaptfor nearly lossless, variable acceleration.
Usage Notes
To leverage MTP, users must pass do_mtp=True to the generate() function and specify the correct mask_id and eos_id for the model. The MTP generation currently supports single-example generation only, without batching.