ghost-actual/Qwen3.5-4B-Claude-Opus-4.6-Distilled-heretic

VISIONConcurrency Cost:1Model Size:4.5BQuant:BF16Ctx Length:32kTool Calling:SupportedPublished:Mar 7, 2026License:lgpl-3.0Architecture:Transformer0.0K Open Weights Cold

ghost-actual/Qwen3.5-4B-Claude-Opus-4.6-Distilled-heretic is a 4.5 billion parameter language model developed by Ghost, based on the Qwen3.5 architecture. It features a hybrid Gated DeltaNet and conventional attention architecture with native multimodal capabilities and a 262K native context length. This model is uniquely distinguished by its Claude Opus 4.6 reasoning distillation, which has been abliterated using Heretic to remove safety refusals while preserving high reasoning quality, making it suitable for applications requiring uncensored, intelligent responses.

Loading preview...

Overview

ghost-actual/Qwen3.5-4B-Claude-Opus-4.6-Distilled-heretic is a 4.5 billion parameter model developed by Ghost, built upon the Qwen3.5 architecture. It integrates Claude Opus 4.6's advanced reasoning capabilities, which have been meticulously preserved while safety refusals were removed using the Heretic tool. This process resulted in an exceptionally low refusal rate of 4/100 and a KL Divergence of 0.0680, indicating near-zero loss of original model intelligence.

Key Capabilities

  • Claude Opus 4.6 Reasoning: Inherits sophisticated chain-of-thought reasoning from Claude Opus 4.6 distillation.
  • Abliterated Safety Refusals: Engineered to provide uncensored responses without compromising reasoning quality.
  • Hybrid Architecture: Utilizes a Qwen3.5 Gated DeltaNet + conventional attention pattern, including native multimodal support.
  • Extended Context: Features a 262K native context length, extensible to over 1M tokens.
  • Efficient VRAM Usage: Runs on approximately 8-9 GB VRAM in BF16/FP16, making it accessible on consumer-grade GPUs like the RTX 3060.

Good for

  • Applications requiring high-quality, uncensored reasoning in a compact 4.5B parameter footprint.
  • Scenarios where preserving Claude-style chain-of-thought is critical, but safety layers need to be minimized or removed.
  • Developers seeking small, intelligent models that avoid the common pitfalls of other uncensored alternatives (e.g., reduced intelligence or reliance on less capable distillation sources).