ApocalypseParty/G4-31B-SFT-v5-2

VISIONConcurrency Cost:2Model Size:31BQuant:FP8Ctx Length:32kTool Calling:SupportedPublished:Jun 11, 2026License:apache-2.0Architecture:Transformer Open Weights Cold

ApocalypseParty/G4-31B-SFT-v5-2 is a 31 billion parameter instruction-tuned multimodal model from the Gemma 4 family by Google DeepMind. This model handles text, image, and video inputs, generating text outputs, and features a 256K token context window. It is optimized for reasoning, coding, and agentic capabilities, making it suitable for complex multimodal understanding tasks.

Loading preview...

What is ApocalypseParty/G4-31B-SFT-v5-2?

This is a 31 billion parameter instruction-tuned model from the Gemma 4 family, developed by Google DeepMind. It is a multimodal model capable of processing text, image, and video inputs to generate text outputs. The model features a substantial 256K token context window and is designed with a hybrid attention mechanism for efficient long-context processing.

Key Capabilities

  • Multimodal Understanding: Processes text, images (with variable aspect ratio and resolution), and video inputs. The E2B and E4B variants also support audio.
  • Advanced Reasoning: Includes a built-in reasoning mode that allows for step-by-step thinking before generating an answer.
  • Extended Context: Supports a context window of up to 256K tokens, enabling deep awareness for complex tasks.
  • Enhanced Coding & Agentic Features: Shows significant improvements in coding benchmarks and includes native function-calling support for autonomous agents.
  • Multilingual Support: Pre-trained on over 140 languages with out-of-the-box support for 35+ languages.
  • Native System Prompt Support: Allows for more structured and controllable conversations.

Should I use this for my use case?

This model is well-suited for applications requiring advanced multimodal understanding, complex reasoning, and robust coding capabilities. Its large context window and agentic features make it ideal for sophisticated AI workflows, content creation, and research in NLP and VLM. It is particularly strong for tasks like document parsing, screen understanding, and code generation where high detail and logical processing are crucial.