Overview
Google DeepMind's Gemma 4 models are a family of open-weight, multimodal LLMs, with the 31B variant being a dense architecture model. These models are designed for text and image input (with audio support on smaller E2B/E4B models) and text output, featuring a substantial context window of up to 256K tokens. Gemma 4 introduces significant advancements in reasoning, extended multimodalities, and enhanced coding and agentic capabilities, including native function-calling support and system prompt integration.
Key Capabilities
- Multimodal Understanding: Processes text, images (with variable aspect ratio and resolution), and video. The E2B and E4B models also natively support audio.
- Advanced Reasoning: All models are designed as highly capable reasoners with configurable thinking modes.
- Extended Context Window: Supports up to 256K tokens for the 26B A4B and 31B models, and 128K for smaller models.
- Enhanced Coding & Agentic Features: Improved coding benchmarks and native function-calling for autonomous agents.
- Multilingual Support: Pre-trained on over 140 languages with out-of-the-box support for 35+ languages.
Good For
- Complex Reasoning Tasks: Leveraging its built-in reasoning mode for step-by-step problem-solving.
- Multimodal Applications: Integrating text and image inputs for tasks like object detection, document parsing, and UI understanding.
- Coding and Agentic Workflows: Generating, completing, and correcting code, and powering autonomous agents with function-calling.
- Long-Context Applications: Handling extensive documents or conversations due to its large context window.
- Research and Development: Serving as a foundation for VLM and NLP research, and developing advanced AI applications.