Gemma 3 (27B) Overview
Gemma 3 is a family of lightweight, open multimodal models developed by Google DeepMind, built upon the same research as the Gemini models. This 27 billion parameter pre-trained variant supports both text and image inputs, generating text outputs. It boasts a substantial 128K token context window and comprehensive multilingual support for over 140 languages.
Key Capabilities
- Multimodal Processing: Handles text and image inputs (images normalized to 896x896 resolution, encoded to 256 tokens each).
- Extensive Context: Features a 128K token input context window, allowing for processing of long and complex queries.
- Multilingual Support: Trained on a diverse dataset including web documents in over 140 languages.
- Broad Task Performance: Excels in text generation, image understanding, question answering, summarization, and reasoning tasks.
- Robust Training: The 27B model was trained on 14 trillion tokens, incorporating web documents, code, and mathematical texts.
Benchmarks and Performance
The Gemma 3 PT 27B model demonstrates strong performance across various benchmarks:
- Reasoning & Factuality: Achieves 85.6 on HellaSwag (10-shot) and 85.5 on TriviaQA (5-shot).
- STEM & Code: Scores 78.6 on MMLU (5-shot), 82.6 on GSM8K (8-shot), and 48.8 on HumanEval (0-shot).
- Multilingual: Reaches 74.3 on MGSM and 75.7 on Global-MMLU-Lite.
- Multimodal: Attains 85.6 on DocVQA (val) and 72.9 on VQAv2.
Intended Usage
This model is well-suited for content creation (text generation, chatbots), research (NLP/VLM research, language learning tools), and image data extraction. Its relatively small size and open weights make it ideal for deployment on laptops, desktops, or cloud infrastructure, democratizing access to advanced AI capabilities.