ApocalypseParty/G4-31B-SFT-v6-1

VISIONConcurrency Cost:2Model Size:31BQuant:FP8Ctx Length:32kTool Calling:SupportedPublished:Jun 21, 2026License:apache-2.0Architecture:Transformer Open Weights Cold

ApocalypseParty/G4-31B-SFT-v6-1 is a 31 billion parameter instruction-tuned multimodal model from the Gemma 4 family developed by Google DeepMind. This model handles text and image inputs, generating text outputs, and features a 256K token context window. It is optimized for reasoning, coding, and agentic workflows, offering strong performance across various benchmarks including MMLU Pro and LiveCodeBench.

Loading preview...

Overview

ApocalypseParty/G4-31B-SFT-v6-1 is a 31 billion parameter instruction-tuned model from Google DeepMind's Gemma 4 family. It is a multimodal model capable of processing text and image inputs to generate text outputs, featuring a substantial 256K token context window. The model employs a hybrid attention mechanism combining local sliding window attention with global attention for efficient long-context processing.

Key Capabilities

  • Multimodality: Processes text and image inputs, with native support for interleaved multimodal prompts. Smaller Gemma 4 models (E2B, E4B) also support audio and video.
  • Reasoning: Designed with highly capable reasoning abilities, including configurable thinking modes.
  • Coding & Agentic Workflows: Shows significant improvements in coding benchmarks and supports native function calling for autonomous agents.
  • Long Context: Features a 256K token context window, enabling complex, long-context tasks.
  • Multilingual Support: Pre-trained on over 140 languages with out-of-the-box support for 35+ languages.
  • Native System Prompt Support: Introduces native support for the system role for more structured conversations.

Benchmark Highlights

  • Achieves 85.2% on MMLU Pro and 89.2% on AIME 2026 no tools.
  • Scores 80.0% on LiveCodeBench v6 and a Codeforces ELO of 2150.
  • Demonstrates strong vision capabilities with 76.9% on MMMU Pro and 85.6% on MATH-Vision.

Good for

  • Content Creation: Generating creative text, code, and marketing copy.
  • Conversational AI: Powering chatbots, virtual assistants, and interactive applications.
  • Research & Education: Serving as a foundation for VLM and NLP research, and language learning tools.
  • Image Understanding: Object detection, document parsing, OCR, and general visual data extraction.
  • Agentic Workflows: Utilizing function calling for structured tool use.