beomi/qwen3-8b-dmax

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Apr 26, 2026License:apache-2.0Architecture:Transformer Open Weights Cold

The beomi/qwen3-8b-dmax is an 8 billion parameter variant of the Qwen3-8B model, fine-tuned using a JAX-trained DMax/OPUT (block-diffusion / on-policy under-tuning) method. This model is designed for block-diffusion inference, expecting a doubled '[noised; clean]' input under a block-diffusion mask, distinguishing it from standard autoregressive Qwen3 models. It leverages a 32768 token context length and is optimized for specific generative tasks requiring this unique input format.

Loading preview...

Overview

The beomi/qwen3-8b-dmax is an 8 billion parameter model derived from Qwen/Qwen3-8B, developed by beomi. It has been fine-tuned using a JAX-based DMax/OPUT (block-diffusion / on-policy under-tuning) training framework, specifically utilizing JAX/Flax NNX on TPUs. This model is distinct from its base as it is not a standard autoregressive Qwen3; instead, it is designed for block-diffusion inference.

Key Characteristics

  • Base Model: Qwen/Qwen3-8B
  • Training Method: JAX-trained DMax/OPUT (block-diffusion / on-policy under-tuning).
  • Inference Requirement: Requires the dllm-jax DMax block-diffusion path, expecting a unique [noised; clean] input format under a block-diffusion mask.
  • Parameter Count: 8 billion parameters.
  • Context Length: 32768 tokens.

Use Cases

This model is particularly suited for research and applications that leverage block-diffusion generative processes. Its specialized training and inference requirements make it ideal for developers exploring advanced generative models that move beyond traditional autoregressive approaches, especially within the JAX/Flax ecosystem.