vmo247/vmo-gemma-4-26b-a4b-ft

VISIONConcurrency Cost:2Model Size:26BQuant:FP8Ctx Length:32kTool Calling:SupportedPublished:Jun 24, 2026License:apache-2.0Architecture:Transformer Open Weights Cold

The vmo247/vmo-gemma-4-26b-a4b-ft is a 26 billion parameter multimodal Mixture-of-Experts (MoE) model from the Gemma 4 family by Google DeepMind, featuring 3.8 billion active parameters for efficient inference. This model excels in reasoning, coding, and multimodal understanding, processing text, images, and video with a 256K token context window. It is designed for scalable deployment on consumer GPUs and workstations, offering enhanced agentic capabilities and native function-calling support.

Loading preview...

Gemma 4 26B A4B MoE Overview

This model is part of the Gemma 4 family developed by Google DeepMind, offering a 26 billion parameter Mixture-of-Experts (MoE) architecture with 3.8 billion active parameters. This design allows for faster inference, performing almost as quickly as a 4B-parameter model while leveraging the capabilities of a larger model. It supports a substantial 256K token context window and is multimodal, capable of processing text, images, and video inputs to generate text outputs. The model is built with a hybrid attention mechanism for efficient long-context processing.

Key Capabilities

  • Reasoning: Designed with configurable thinking modes for enhanced problem-solving.
  • Multimodality: Processes text, images (with variable aspect ratio and resolution), and video, allowing for interleaved inputs.
  • Coding & Agentic Capabilities: Achieves strong performance in coding benchmarks and includes native function-calling support for autonomous agents.
  • Long Context: Features a 256K token context window, suitable for complex, long-context tasks.
  • Native System Prompt Support: Enables more structured and controllable conversations.

Good For

  • Reasoning-intensive tasks: Leveraging its built-in thinking mode.
  • Multimodal applications: Integrating text, image, and video understanding.
  • Code generation and agentic workflows: Due to its enhanced coding and function-calling support.
  • Deployment on consumer GPUs: Offering efficient performance for its size.