eggdog100/Qwen3.6-35B_Zenith-Abliterated

TEXT GENERATIONConcurrency Cost:3Model Size:35.1BQuant:FP8Ctx Length:32kTool Calling:SupportedPublished:Jun 24, 2026License:cc-by-nc-4.0Architecture:Transformer0.0K Open Weights Cold

eggdog100/Qwen3.6-35B_Zenith-Abliterated is a 35.1 billion parameter Qwen3.6-based hybrid-MoE language model with a 32K context length, developed by eggdog100. This model is an "abliterated" derivative of Qwen3.6-35B_Zenith, specifically engineered to significantly reduce refusal behavior while preserving its original capabilities, coherence, and multimodal vision. It achieves this by minimally editing only 10 full-attention layers, making it suitable for applications requiring less restrictive content generation.

Loading preview...

Overview

eggdog100/Qwen3.6-35B_Zenith-Abliterated is a 35.1 billion parameter model derived from eggdog100/Qwen3.6-35B_Zenith. It utilizes a qwen3_5_moe architecture, featuring 40 layers (30 GatedDeltaNet linear-attention and 10 full-attention layers) with 256 experts and an intact vision tower. The primary innovation of this model is its abliteration, a process that significantly suppresses refusal behavior, reducing adversarial prompt refusals to 10/100 (compared to 85-98/100 in the base model) while maintaining core capabilities and multimodal vision.

Key Differentiators

  • Refusal Suppression: Achieves a substantial reduction in refusal rates through a minimal, verified weight edit.
  • Preserved Capabilities: The abliteration process is highly targeted, modifying only 10 self_attn.o_proj layers, ensuring that the model's coherence, mathematical, code, translation, knowledge, and multi-step reasoning abilities, as well as its vision tower, remain unchanged.
  • Hybrid-MoE Architecture: Leverages a sophisticated Qwen3.6 hybrid (GatedDeltaNet + MoE) design, though users should note that llama.cpp support for this architecture is new and may have lower throughput than mature architectures.
  • Verified Edit: The changes are rigorously verified, with a KL divergence of only 0.0083 from the base model, confirming the minimal impact on overall model behavior beyond refusal reduction.

Usage Notes

For optimal performance, use Qwen sampling parameters (temperature=0.6, top_p=0.95, top_k=20) and avoid greedy decoding or large repetition penalties. The model supports bf16 precision and is available with GGUF quantizations, including a recommended Q4_K_M variant. A separate mmproj GGUF file is provided for vision input.