Overview
Gemma 3 4B Instruction-Tuned (QAT)
This model is a 4.3 billion parameter instruction-tuned variant of the Gemma 3 family, developed by Google DeepMind. It leverages Quantization Aware Training (QAT) to achieve performance comparable to bfloat16 models while drastically reducing memory footprint, making it suitable for deployment on devices with limited resources like laptops or desktops. The checkpoint provided is unquantized and requires int4 quantization using a preferred tool.
Key Capabilities
- Multimodal Understanding: Processes both text and image inputs (images normalized to 896x896 resolution, encoded to 256 tokens each) and generates text outputs.
- Extended Context Window: Features a large 128K token input context window, enabling processing of extensive documents and complex queries.
- Multilingual Support: Trained on data including over 140 languages, enhancing its utility for global applications.
- Versatile Text Generation: Excels in tasks such as question answering, summarization, creative text generation, and reasoning.
- Optimized for Deployment: QAT enables efficient deployment in resource-constrained environments without significant quality degradation.
Good For
- Edge and Local Deployments: Ideal for applications requiring powerful AI capabilities on devices with limited memory.
- Text and Image Analysis: Suitable for tasks involving understanding and generating responses based on combined text and visual information.
- Multilingual Applications: Effective for processing and generating content in a wide array of languages.
- Research and Development: Serves as a foundation for experimenting with VLM and NLP techniques, especially where efficient resource utilization is critical.