prithivMLmods/gemma-4-26B-A4B-it-Uncensored-MAX

VISIONConcurrency Cost:2Model Size:26BQuant:FP8Ctx Length:32kTool Calling:SupportedPublished:Apr 4, 2026License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

The prithivMLmods/gemma-4-26B-A4B-it-Uncensored-MAX is a 26 billion parameter Mixture-of-Experts (MoE) language model, optimized for efficient inference and stable deployment. Built upon the huihui-ai/Huihui-gemma-4-26B-A4B-it-abliterated base, this version features updated shard sizing and repository optimizations for enhanced compatibility with the latest Transformers releases. It preserves the original Gemma MoE architecture and reasoning capabilities, making it suitable for research into large-scale transformer behavior and high-performance deployment.

Loading preview...

Model Overview

prithivMLmods/gemma-4-26B-A4B-it-Uncensored-MAX is an optimized 26 billion parameter Mixture-of-Experts (MoE) language model, derived from huihui-ai/Huihui-gemma-4-26B-A4B-it-abliterated. This release focuses on technical enhancements rather than changes to the core model weights or architecture, ensuring consistent behavior with its base lineage.

Key Optimizations & Features

  • Latest Transformers Compatibility: Re-sharded and optimized to work seamlessly with recent versions of the Hugging Face Transformers library.
  • Optimized Model Sharding: Features an updated shard structure designed for improved storage handling, more reliable downloads, and enhanced inference efficiency.
  • Stable Inference Pipeline: Packaged for consistent loading and generation across various environments, improving deployment stability.
  • 26B MoE Architecture: Leverages the Mixture-of-Experts design for scalable reasoning capacity, inherited from the gemma-4-26B-A4B-it base.

Intended Use Cases

This model is primarily intended for:

  • Multimodal and Language Research: Ideal for studying the behavior of large-scale transformer and MoE architectures.
  • Red-Teaming & Evaluation: Suitable for testing model robustness against complex and adversarial prompts.
  • High-Performance Deployment: Designed for running large models efficiently on optimized GPU or distributed inference setups.
  • Research Prototyping: Useful for experimentation with scalable transformer architectures.

Limitations

As a 26B MoE model, it requires significant GPU memory and optimized inference strategies. Performance is highly dependent on hardware and runtime optimization. Users should be aware of potential output variability and general model limitations, including the possibility of incorrect or inconsistent outputs.