google/gemma-4-31B

Hugging Face
TEXT GENERATIONConcurrency Cost:2Model Size:31BQuant:FP8Ctx Length:32kTool Calling:SupportedPublished:Mar 12, 2026License:apache-2.0Architecture:Transformer0.4K Open Weights Warm

Gemma 4-31B is a 30.7 billion parameter multimodal language model developed by Google DeepMind, part of the Gemma 4 family. It processes text and image inputs, generating text outputs, and features a 256K token context window. This model is optimized for reasoning, coding, and agentic workflows, offering strong performance in complex tasks.

Loading preview...

Overview

Google DeepMind's Gemma 4 models are a family of open, multimodal models designed for text and image input, with text output. The Gemma 4-31B is a 30.7 billion parameter dense model, while the Gemma 4-26B A4B is a 25.2 billion parameter Mixture-of-Experts (MoE) model with 3.8 billion active parameters, offering faster inference. Both models support a 256K token context window and are multilingual, supporting over 140 languages.

Key Capabilities

  • Multimodality: Handles text and image inputs, with variable aspect ratio and resolution support. Smaller E2B/E4B models also support audio and video.
  • Reasoning: Designed with configurable thinking modes for step-by-step problem-solving.
  • Coding & Agentic Workflows: Enhanced performance in coding benchmarks and native function-calling support for autonomous agents.
  • Long Context: Supports up to 256K tokens, utilizing a hybrid attention mechanism for efficiency.
  • Native System Prompt Support: Enables more structured and controllable conversations.

Good For

  • Complex Reasoning Tasks: Excels in benchmarks like MMLU Pro (85.2%) and AIME 2026 (89.2%).
  • Code Generation: Achieves 80.0% on LiveCodeBench v6 and a Codeforces ELO of 2150.
  • Multimodal Understanding: Strong performance in MMMLU (88.4%) and MMMU Pro (76.9%) for vision-language tasks.
  • Content Creation: Generating creative text, marketing copy, and powering chatbots.
  • Research & Development: Serving as a foundation for VLM and NLP research.

Popular Sampler Settings

Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.

temperature
top_p
top_k
frequency_penalty
presence_penalty
repetition_penalty
min_p