beomi/qwen3-8b-dmax
The beomi/qwen3-8b-dmax is an 8 billion parameter variant of the Qwen3-8B model, fine-tuned using a JAX-trained DMax/OPUT (block-diffusion / on-policy under-tuning) method. This model is designed for block-diffusion inference, expecting a doubled '[noised; clean]' input under a block-diffusion mask, distinguishing it from standard autoregressive Qwen3 models. It leverages a 32768 token context length and is optimized for specific generative tasks requiring this unique input format.
Loading preview...
Overview
The beomi/qwen3-8b-dmax is an 8 billion parameter model derived from Qwen/Qwen3-8B, developed by beomi. It has been fine-tuned using a JAX-based DMax/OPUT (block-diffusion / on-policy under-tuning) training framework, specifically utilizing JAX/Flax NNX on TPUs. This model is distinct from its base as it is not a standard autoregressive Qwen3; instead, it is designed for block-diffusion inference.
Key Characteristics
- Base Model:
Qwen/Qwen3-8B - Training Method: JAX-trained DMax/OPUT (block-diffusion / on-policy under-tuning).
- Inference Requirement: Requires the
dllm-jaxDMax block-diffusion path, expecting a unique[noised; clean]input format under a block-diffusion mask. - Parameter Count: 8 billion parameters.
- Context Length: 32768 tokens.
Use Cases
This model is particularly suited for research and applications that leverage block-diffusion generative processes. Its specialized training and inference requirements make it ideal for developers exploring advanced generative models that move beyond traditional autoregressive approaches, especially within the JAX/Flax ecosystem.