AEON-7/DFlash-Qwen3.5-27B-Uncensored
AEON-7/DFlash-Qwen3.5-27B-Uncensored is a 27 billion parameter hybrid linear-attention model based on Qwen3.5, developed by AEON-7. This BF16 full-precision model integrates vision and text capabilities and is optimized with DFlash speculative decoding for enhanced inference speed. It features a unique architecture combining Gated Delta Network layers for long-context efficiency and standard full attention for global context, making it suitable for responsive, high-quality multimodal applications. The model has also undergone abliteration to remove safety alignment, providing an uncensored output.
Loading preview...
AEON-7/DFlash-Qwen3.5-27B-Uncensored Overview
This model is a 27 billion parameter, BF16 full-precision Qwen3.5 variant developed by AEON-7, featuring a hybrid linear-attention architecture and integrated vision-text capabilities. Its primary differentiator is the implementation of DFlash block-diffusion speculative decoding, which significantly boosts inference speed on memory-bandwidth-limited hardware like DGX Spark, achieving up to 33.2 tok/s in single-stream generation. This makes the dense 27B model competitive in speed with larger MoE models while retaining the quality advantages of a dense architecture.
Key Capabilities & Features
- DFlash Speculative Decoding: Utilizes a 2B block-diffusion drafter (z-lab/Qwen3.5-27B-DFlash) to amortize memory bandwidth costs, leading to substantial throughput improvements.
- Hybrid Attention Architecture: Combines 48 Gated Delta Network (GDN) layers for efficient long-context processing (O(1) per-token state) with 16 full-attention layers for global context capture, supporting a maximum context length of 131,072 tokens.
- Vision + Text: Incorporates a 27-layer ViT vision encoder (460M parameters) for multimodal understanding and generation.
- Uncensored Output: Created using an orthogonal projection abliteration technique to remove safety alignment, resulting in a model with no built-in refusal behavior.
- Dense Architecture Advantages: Offers higher quality per FLOP, predictable latency, and simpler deployment compared to MoE models, as every one of its 27B parameters contributes to every token.
When to Use This Model
This model is ideal for developers seeking a high-quality, dense 27B model that delivers responsive and fluid inference speeds, especially on hardware where memory bandwidth is a bottleneck. Its uncensored nature makes it suitable for research or applications requiring unfiltered output. The multimodal capabilities extend its utility to tasks involving both image and text understanding. For NVIDIA Blackwell GPUs, an NVFP4 version is available, offering 3x memory reduction and hardware-accelerated performance with effectively lossless quality.