google/gemma-3-27b-it-qat-q4_0-unquantized
Hugging Face
VISIONConcurrency Cost:2Model Size:27BQuant:FP8Ctx Length:32kPublished:Apr 15, 2025License:gemmaArchitecture:Transformer0.0K Gated Warm

The google/gemma-3-27b-it-qat-q4_0-unquantized model is a 27 billion parameter instruction-tuned multimodal language model from Google's Gemma 3 family, built from the same research as Gemini models. It handles text and image inputs, generating text outputs, and features a 128K context window with multilingual support across 140+ languages. This specific checkpoint is unquantized but designed for quantization-aware training (QAT) to maintain bfloat16 quality with reduced memory. It excels in diverse text generation, image understanding, and reasoning tasks, suitable for deployment in resource-limited environments.

Loading preview...

Gemma 3 27B Instruction-Tuned QAT Model

This model is the 27 billion parameter instruction-tuned version of Google's Gemma 3 family, utilizing Quantization Aware Training (QAT). While the provided checkpoint is unquantized, it's optimized for Q4_0 quantization, allowing it to preserve high quality while significantly reducing memory requirements. Gemma 3 models are multimodal, capable of processing both text and image inputs to generate text outputs.

Key Capabilities

  • Multimodal Input: Accepts text strings and images (normalized to 896x896 resolution, encoded to 256 tokens each).
  • Extensive Context: Features a large 128K token input context window.
  • Multilingual Support: Trained on data in over 140 languages.
  • Diverse Task Performance: Well-suited for question answering, summarization, reasoning, and image analysis.
  • Optimized for Deployment: Its QAT design makes it suitable for deployment in environments with limited resources like laptops or desktops.

Performance Highlights (Gemma 3 PT 27B)

  • Reasoning: Achieves 85.6 on HellaSwag (10-shot), 85.5 on TriviaQA (5-shot), and 77.7 on BIG-Bench Hard (few-shot).
  • STEM & Code: Scores 78.6 on MMLU (5-shot), 82.6 on GSM8K (8-shot), and 48.8 on HumanEval (0-shot).
  • Multimodal: Demonstrates strong performance on benchmarks like COCOcap (116), DocVQA (85.6), and MMMU (56.1).

Intended Usage

This model is designed for a wide range of applications including content creation (text generation, marketing copy), conversational AI (chatbots, virtual assistants), text summarization, and image data extraction. It also serves as a foundation for VLM and NLP research and language learning tools.