Model Overview
DavidAU's gemma-3-12b-it-GLM-Flash-4.7-Heretic-Thinking-Uncensored-GRANDE is a 12 billion parameter Gemma 3 instruction-tuned model, specifically fine-tuned for deep and detailed reasoning. Utilizing the GLM 4.7 Flash reasoning dataset via Unsloth, this model is designed to provide direct, uncensored responses without refusal, offering a high degree of freedom in content generation. It features a 32,768 token context length and stable reasoning across a wide temperature range (.1 to 2.5).
Key Capabilities
- Uncensored Output: Provides responses without guardrails or refusals, even for sensitive or explicit content, requiring only minimal direction for desired graphic levels.
- Enhanced Reasoning: Trained to deliver compact yet highly detailed reasoning, impacting general model operation, output generation, image processing, and benchmarks.
- Flexible Activation: Reasoning can be activated via specific prompts like "think deeply:" or through optional system prompts and specialized Jinja templates for always-on thinking.
- Modified Architecture: Features a modified Gemma 12B structure allowing separate training and quantization of the LM_HEAD (output tensor) and embed layers, leading to slightly larger GGUF quants.
Benchmarks & Performance
While still pending full benchmark results, initial ARC-C scores show 0.573, an improvement over the uncensored base model's 0.534. Heretic de-censoring stats indicate a very low KL divergence of 0.0826 and significantly reduced refusals (7/100) compared to the original google/gemma-3-12b-it (98/100).
Optimal Usage
For smoother operation and enhanced chat/roleplay, users are advised to set a "Smoothing_factor" to 1.5 in interfaces like KoboldCpp, oobabooga/text-generation-webui, or Silly Tavern. Increasing repetition penalty to 1.1-1.15 is also suggested if smoothing is not used.