JMingo/gemma-4-31B-it

VISIONConcurrency Cost:2Model Size:31BQuant:FP8Ctx Length:32kTool Calling:SupportedPublished:May 27, 2026License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

JMingo/gemma-4-31B-it is a 30.7 billion parameter instruction-tuned multimodal language model developed by Google DeepMind, part of the Gemma 4 family. It features a 256K token context window and supports text and image inputs, excelling in reasoning, coding, and agentic capabilities. This model is well-suited for complex text generation, coding, and multimodal understanding tasks on consumer GPUs and workstations.

Loading preview...

Model Overview

JMingo/gemma-4-31B-it is a 30.7 billion parameter instruction-tuned model from the Gemma 4 family, developed by Google DeepMind. This model is a resaved version of google/gemma-4-31B-it with no modified weights, ensuring all credits remain with the original authors. It is a multimodal model capable of processing text and image inputs, generating text outputs, and supports a substantial 256K token context window.

Key Capabilities

  • Multimodality: Processes text and image inputs with variable aspect ratio and resolution support. Video processing is also supported by analyzing sequences of frames.
  • Reasoning: Designed as a highly capable reasoner with configurable thinking modes, enabling step-by-step problem-solving.
  • Coding & Agentic Capabilities: Achieves notable improvements in coding benchmarks and includes native function-calling support for autonomous agents.
  • Multilingual Support: Pre-trained on over 140 languages, with out-of-the-box support for 35+ languages.
  • Native System Prompt Support: Introduces native support for the system role for more structured and controllable conversations.

Good For

  • Complex Reasoning Tasks: Its design as a strong reasoner makes it suitable for tasks requiring logical deduction.
  • Code Generation and Assistance: Enhanced coding capabilities make it effective for code generation, completion, and correction.
  • Multimodal Understanding: Ideal for applications requiring the interpretation of both text and images, such as document parsing, screen understanding, and OCR.
  • Agentic Workflows: Native function-calling support facilitates the development of highly capable autonomous agents.