Overview
Gemma 3 27B Instruction-Tuned QAT Model
This model is the 27 billion parameter instruction-tuned version of Google's Gemma 3 family, utilizing Quantization Aware Training (QAT). While the provided checkpoint is unquantized, it's optimized for Q4_0 quantization, allowing it to preserve high quality while significantly reducing memory requirements. Gemma 3 models are multimodal, capable of processing both text and image inputs to generate text outputs.
Key Capabilities
- Multimodal Input: Accepts text strings and images (normalized to 896x896 resolution, encoded to 256 tokens each).
- Extensive Context: Features a large 128K token input context window.
- Multilingual Support: Trained on data in over 140 languages.
- Diverse Task Performance: Well-suited for question answering, summarization, reasoning, and image analysis.
- Optimized for Deployment: Its QAT design makes it suitable for deployment in environments with limited resources like laptops or desktops.
Performance Highlights (Gemma 3 PT 27B)
- Reasoning: Achieves 85.6 on HellaSwag (10-shot), 85.5 on TriviaQA (5-shot), and 77.7 on BIG-Bench Hard (few-shot).
- STEM & Code: Scores 78.6 on MMLU (5-shot), 82.6 on GSM8K (8-shot), and 48.8 on HumanEval (0-shot).
- Multimodal: Demonstrates strong performance on benchmarks like COCOcap (116), DocVQA (85.6), and MMMU (56.1).
Intended Usage
This model is designed for a wide range of applications including content creation (text generation, marketing copy), conversational AI (chatbots, virtual assistants), text summarization, and image data extraction. It also serves as a foundation for VLM and NLP research and language learning tools.