eggdog100/Qwable-v1-Qwen3.6-35B-A3B-abliterated
eggdog100/Qwable-v1-Qwen3.6-35B-A3B-abliterated is a 35.1 billion parameter Mixture-of-Experts (MoE) model derived from lordx64/Qwable-v1, which is based on Qwen3.6-35B-A3B. This model features a Gated-DeltaNet hybrid linear attention mechanism and an intact vision tower, making it multimodal. It has been specifically abliterated (refusal-suppressed) to reduce refusal behavior while maintaining coherence, making it suitable for applications requiring less restrictive content generation.
Loading preview...
Model Overview
This model, eggdog100/Qwable-v1-Qwen3.6-35B-A3B-abliterated, is a 35.1 billion parameter Mixture-of-Experts (MoE) model. It is a refusal-suppressed derivative of lordx64/Qwable-v1, which itself is based on the Qwen3.6-35B-A3B architecture. The model incorporates a Gated-DeltaNet hybrid linear attention and retains its original multimodal capabilities with an intact vision tower.
Key Differentiators & Improvements (v2)
This v2 release addresses significant issues from a previous version, which suffered from repetition collapse due to aggressive MoE router editing. The current version employs a refined abliteration method using abliterix v1.8.0, focusing on in-place direct weight editing of the attn.o_proj via orthogonal projection of the refusal direction. Crucially, the MoE router and experts are left untouched, preserving model coherence. The abliteration process was rigorously verified with actual generation tests to ensure stability and prevent repetition.
Performance & Capabilities
- Refusal Rate: Achieves a low refusal rate of 1/100 on adversarial prompts (thinking-off) and 1/94 (thinking-on, finished answers), significantly reduced from the base model's ~85-87/100.
- Coherence: Verified to have 0 collapses across extensive greedy and sampled generations.
- Multimodal: The vision tower remains untouched and fully functional.
- Benchmarks: While not a direct head-to-head, it achieves MMLU-Pro scores of 78.9 and GSM8K scores of ~95 (sampled, thinking-on).
Usage Notes
This is a "thinking model" and should be run with thinking enabled and Qwen sampling parameters (e.g., temperature=0.6, top_p=0.95, top_k=20). Users should avoid greedy decoding and large repetition/presence penalties to ensure optimal performance. Quantizations (GGUF) are available, including IQ2_XXS for smaller footprint, and are compatible with current llama.cpp / LM Studio / Ollama.