DavidAU/Qwen3.6-12B-IQ-Ultra-Heretic-Uncensored-Thinking

VISIONConcurrency Cost:2Model Size:27BQuant:FP8Ctx Length:32kTool Calling:SupportedPublished:May 12, 2026License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

DavidAU/Qwen3.6-12B-IQ-Ultra-Heretic-Uncensored-Thinking is a 12 billion parameter language model derived from the Qwen 3.6 27B architecture, featuring a 32K context length. It was created by DavidAU by uncensoring the base 27B model via Heretic, then shrinking it to 12B parameters and fine-tuning it across multiple datasets. This model is optimized for general and specific use cases, including creative tasks, and retains image/video training capabilities from its larger base.

Loading preview...

Model Overview

DavidAU/Qwen3.6-12B-IQ-Ultra-Heretic-Uncensored-Thinking is a 12 billion parameter model based on the Qwen 3.6 27B architecture, featuring a 32K context length. This model underwent a unique development process: the original Qwen 3.6 27B was first uncensored using the Heretic method by P-E-W (with "trohrbaugh" performing the heretic'ing), then "shrunk" to 12B parameters (24 layers from 64) via a modified Mergekit. The resulting 12B model was subsequently fine-tuned using Unsloth on local hardware across six datasets in two stages, focusing on unifying its new layer structure.

Key Capabilities & Features

  • Uncensored Nature: Derived from a Heretic-processed Qwen 3.6 27B, offering an uncensored output.
  • Efficient Architecture: Reduced to 12B parameters and 24 layers, contributing to faster inference (e.g., 150 t/s on Q4KS with a 5090 GPU).
  • Retained Multimodal Capabilities: Preserves image/video training and systems from the full 27B model.
  • Fine-tuned for Versatility: General tuning included math, code, creative, and reasoning tasks.
  • Accessibility for Further Tuning: Can be fine-tuned on Free Google Colab or local hardware with 12-16 GB VRAM.

Important Considerations

  • The model may require additional tuning for optimal performance in specific or general use cases.
  • Knowledge or skills from the original 27B model might be missing due to the compression process.
  • To reach "full power," the model would benefit from 25-50k additional training samples on relevant datasets.