Goekdeniz-Guelmez/Qwen3-4B-Sky-High-Hermes-gabliterated

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:32kPublished:Jan 22, 2026Architecture:Transformer0.0K Warm

Goekdeniz-Guelmez/Qwen3-4B-Sky-High-Hermes-gabliterated is a 4 billion parameter model from the Qwen3 series, developed by Goekdeniz-Guelmez, featuring the novel Gabliteration neural weight modification technique. This technique uses adaptive multi-directional projections with regularized layer selection to precisely alter model behavior, specifically targeting refusal patterns. It is notable as the first 4B model to achieve a W/10 score of 9.8 on UGI benchmarks, demonstrating enhanced control over model responses.

Loading preview...

Gabliterated Model Series: Qwen3-4B-Sky-High-Hermes-gabliterated

This model introduces Gabliteration, a novel neural weight modification technique developed by Goekdeniz-Guelmez. Gabliteration advances beyond traditional abliteration methods by employing adaptive multi-directional projections with regularized layer selection, specifically designed to address limitations in modifying behavioral patterns without compromising overall model quality.

Key Capabilities & Features

  • Novel Gabliteration Technique: Utilizes adaptive multi-directional projections and regularized layer selection to precisely alter model behavior, particularly in managing refusal patterns.
  • Enhanced Behavioral Control: Addresses the fundamental limitation of existing abliteration methods by offering a more refined approach to modifying specific behavioral patterns.
  • Scalability: The Gabliteration technique has been demonstrated across model sizes ranging from 0.6B to 32B parameters.
  • Dynamic Layer Selection: This specific model was created using a fixed layer selection, with layer 19 (out of 36 total layers) empirically tuned and selected for modification.
  • Benchmark Performance: Achieves a W/10 score of 9.8 on UGI benchmarks, making it the first 4B model to reach this score.

Technical Background

Building on the work of Arditi et al. (2024), Gabliteration extends single-direction abliteration to a comprehensive multi-directional framework. It employs singular value decomposition on difference matrices between harmful and harmless prompt representations to extract multiple refusal directions, offering theoretical guarantees for its approach.

Use Cases

This model is particularly suited for applications requiring fine-grained control over model responses, especially in mitigating unwanted behaviors like refusals, while maintaining high performance. Its unique modification technique makes it valuable for research into model safety and alignment.