nbeerbower/Gemma4-Gutenberg-31B-Heretic

VISIONConcurrency Cost:2Model Size:31BQuant:FP8Ctx Length:32kTool Calling:SupportedPublished:Jun 16, 2026License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

nbeerbower/Gemma4-Gutenberg-31B-Heretic is a 31 billion parameter Gemma 4 model, fine-tuned by nbeerbower with a 32768 token context length. It specializes in generating literary, novelistic prose with controlled pacing and an active dispreference for 'AI slop' phrasing. This 'Heretic' variant is abliterated/decensored, making it suitable for dark or mature creative writing prompts that standard instruct models might refuse.

Loading preview...

Model Overview

nbeerbower/Gemma4-Gutenberg-31B-Heretic is a 31 billion parameter Gemma 4 model, building upon the coder3101/gemma-4-31B-it-heretic base. It has been specifically fine-tuned to excel in generating literary and novelistic prose, emphasizing story, interiority, and controlled pacing while actively avoiding generic "AI slop" phrasing. The model maintains a 32768 token context length.

Key Differentiators

  • Literary Prose Specialization: Optimized for generating high-quality literary fiction, focusing on narrative depth and stylistic nuance.
  • "Heretic" Variant: This version is abliterated/decensored, allowing it to handle dark or mature creative prompts that typical instruct models might refuse, making it suitable for a broader range of fiction genres.
  • Multimodal Base: While fine-tuned for text, the underlying Gemma 4 31B base is a unified multimodal model, with its vision/audio towers left frozen and intact, allowing it to remain a drop-in replacement for the base model's multimodal capabilities.

Training Methodology

The model was fine-tuned using ORPO (Odds Ratio Preference Optimization) with a beta of 0.1, applied to the full schneewolflabs/Athanorlite-DPO dataset, which comprises 14,816 preference pairs. This dataset is a superset of various Gutenberg DPO recipes, including jondurbin/gutenberg-dpo-v0.1, nbeerbower/gutenberg2-dpo, gutenberg-moderne-dpo, human-writing-dpo, and synthetic-fiction-dpo, alongside balance sets for truthy, physical-reasoning, and theory-of-mind tasks. The training involved a LoRA adapter (r=64, text decoder only) merged to full, over 1 epoch with a max length of 2048 tokens, demonstrating strong convergence in reward accuracy from 0.175 to 0.9125.