google/gemma-3-27b-pt

Warm
Public
Vision
27B
FP8
32768
License: gemma
Hugging Face
Gated
Overview

Gemma 3: Multimodal Foundation Models by Google

Gemma 3 is a family of open, lightweight, and state-of-the-art multimodal models developed by Google, leveraging the same research and technology as the Gemini models. The google/gemma-3-27b-pt is a 27 billion parameter pre-trained model within this family, designed to handle both text and image inputs to generate text outputs.

Key Capabilities & Features

  • Multimodal Input: Processes text strings and images (normalized to 896x896 resolution, encoded to 256 tokens each).
  • Large Context Window: Supports a total input context of 128K tokens, enabling comprehensive understanding of long inputs.
  • Multilingual Support: Trained on data in over 140 languages, enhancing its global applicability.
  • Diverse Training Data: Pre-trained on 14 trillion tokens, including web documents, code, mathematics, and images, ensuring broad knowledge and reasoning abilities.
  • Strong Performance: Achieves competitive benchmark results across reasoning, STEM, code, and multimodal tasks, including 85.6 on HellaSwag, 78.6 on MMLU, and 85.6 on DocVQA.

Good For

  • Content Creation: Generating creative text formats, marketing copy, and email drafts.
  • Conversational AI: Powering chatbots and virtual assistants.
  • Text Summarization: Creating concise summaries of documents and research papers.
  • Image Understanding: Extracting, interpreting, and summarizing visual data for text communications.
  • Research & Education: Serving as a foundation for VLM and NLP research, and supporting language learning tools.