dndll/GRM-2.6-Opus-Heretic-Abliterated-MTP
The dndll/GRM-2.6-Opus-Heretic-Abliterated-MTP is a 27 billion parameter causal language model based on the Qwen3.6 architecture, specifically an 'abliterated' and repaired version of OrionLLM/GRM-2.6-Opus. It features a 32,768 token context length and incorporates SSM conv1d outlier repair and Heretic Abliteration for improved stability and reduced 'philosophizing' loops. This model is designed for general language tasks, with a focus on long-context effectiveness and includes Multi-Token Prediction (MTP) capabilities.
Loading preview...
Model Overview
dndll/GRM-2.6-Opus-Heretic-Abliterated-MTP is a 27 billion parameter causal language model derived from the Qwen3.6-27B base model and an 'abliterated' version of OrionLLM/GRM-2.6-Opus. It features a native context length of 32,768 tokens, extensible up to 1,010,000 tokens using YaRN scaling techniques. The model incorporates several key modifications:
Key Differentiators
- SSM conv1d Outlier Repair: Addresses a defect in Qwen3.5/3.6 hybrids where certain SSM blocks had inflated weights, leading to coherence collapse and reasoning loops in long contexts. This repair rescales outlier weights to ensure uniform sigma across SSM layers.
- Heretic Abliteration: A process applied after outlier repair, achieving a KL divergence of 0.0096 and 5/100 refusals in testing, indicating a balance between performance and reduced refusal behavior.
- MTP Graft: Multi-Token Prediction (MTP) heads from the original BF16 model were grafted back to enhance generation efficiency.
Capabilities & Performance
While the original GRM-2.6-Opus was noted for GPQA Diamond performance, this 'abliterated' version aims to improve upon reasoning, which the developer notes was a weakness in prior Opus variants. The base Qwen3.6-27B model demonstrates strong performance across various benchmarks, including:
- Coding Agent: Achieves 77.2% on SWE-bench Verified and 59.3% on Terminal-Bench 2.0.
- Knowledge: Scores 86.2% on MMLU-Pro and 93.5% on MMLU-Redux.
- STEM & Reasoning: Attains 87.8% on GPQA Diamond and 93.8% on HMMT Feb 25.
- Vision Language: Supports image and video input, with strong results in MMMU (82.9%) and RealWorldQA (84.1%).
Recommended Use Cases
- Long-Context Applications: Ideal for tasks requiring extensive context processing, with native support for 32,768 tokens and extensibility to over 1 million tokens.
- Agentic Workflows: Excels in tool calling, with specific support for Qwen-Agent and Qwen Code for terminal-based AI agent applications.
- Multimodal Tasks: Capable of processing both text and visual inputs (images and videos), making it suitable for VQA and other multimodal reasoning tasks.
However, the developer notes that previous Opus variants were "pretty poor at reasoning" and "mostly gimmicks" for engineering work, suggesting this version is an attempt to address those limitations, though personal testing is still ongoing for long-context effectiveness.