BinxNet/gemma-4-26B-A4B-it-heretic
BinxNet/gemma-4-26B-A4B-it-heretic is a 26 billion parameter instruction-tuned causal language model, based on Google DeepMind's Gemma 4 architecture, with a 32768 token context length. This model is a decensored variant of google/gemma-4-26B-A4B-it, created using the Heretic v1.4.0 tool, and is designed to reduce refusals compared to the original. It features a Mixture-of-Experts (MoE) architecture with 3.8 billion active parameters, making it efficient for fast inference while supporting multimodal inputs including text, image, and video.
Loading preview...
Overview
BinxNet/gemma-4-26B-A4B-it-heretic is a 26 billion parameter instruction-tuned model derived from Google DeepMind's Gemma 4 family, specifically the 26B A4B variant. This version has been processed with Heretic v1.4.0 to create a decensored model, significantly reducing refusals from 100/100 in the original to 25/100 in this iteration. It maintains the Gemma 4's multimodal capabilities, handling text, image, and video inputs, and generating text outputs.
Key Capabilities
- Decensored Output: Achieves a refusal rate of 25/100 compared to 100/100 for the original model, offering less restricted responses.
- Multimodal Understanding: Processes text, image, and video inputs, with support for variable image aspect ratios and resolutions.
- Efficient MoE Architecture: Utilizes a Mixture-of-Experts (MoE) design with 3.8 billion active parameters, allowing for faster inference despite its 26 billion total parameters.
- Extended Context Window: Supports a context length of up to 256K tokens, enabling processing of long and complex inputs.
- Reasoning & Agentic Features: Designed with strong reasoning capabilities, native function-calling support, and a built-in thinking mode for step-by-step processing.
- Multilingual Support: Pre-trained on over 140 languages with out-of-the-box support for 35+ languages.
Good For
This model is particularly well-suited for applications requiring less restrictive content generation, multimodal understanding (especially with images and video), and efficient inference for complex tasks. Its decensored nature makes it suitable for use cases where the original Gemma 4 model's refusal rates might be prohibitive, while its MoE architecture provides a balance of performance and speed.