Model Overview
This model, Gemma-3-1B-it-GLM-4.7-Flash-Heretic-Uncensored-Thinking, is a 1 billion parameter Gemma-based instruction-tuned model developed by klinevanya. It leverages the GLM 4.7 reasoning dataset and is fine-tuned for uncensored, deep thinking capabilities, aiming to provide direct and detailed responses without refusal. The model maintains a 32k token context length and exhibits stable reasoning across a wide temperature range (0.1 to 2.5).
Key Capabilities & Features
- Uncensored Output: Designed to generate content exactly as requested, with a significantly reduced refusal rate (3/100) compared to the original Gemma model (99/100).
- Deep Reasoning: Enhanced reasoning capabilities, affecting general operation, output generation, and benchmarks. Reasoning is compact yet detailed.
- Flexible Activation: Thinking can be activated using "think deeply: prompt" or through specific system prompts and JINJA templates, though it often activates automatically.
- Temperature Stability: Reasoning remains stable across a broad temperature range.
Performance & Usage Notes
Benchmarks indicate competitive performance across various tasks, including ARC-Challenge (0.344), HellaSwag (0.504), and PIQA (0.720). For optimal performance, the developer suggests using q5, q6, q8, or 16-bit precision quants, or IQ3_M minimum. A repetition penalty of 1.05 to 1.1 is recommended to prevent looping, especially with lower quality quants. The model may require explicit direction (e.g., using specific slang or terms) to generate highly graphic or explicit content, as its default output can be "tame" despite being uncensored.