Gemma 3 1B Instruction-Tuned Model by Google DeepMind

This model is a 1 billion parameter instruction-tuned variant from the Gemma 3 family, developed by Google DeepMind. Built on the same research and technology as the Gemini models, Gemma 3 is a multimodal model capable of processing both text and image inputs (normalized to 896x896 resolution) and generating text outputs. The 1B version features a 32K token context window for inputs and an 8192 token output context.

Key Capabilities

Multimodal Understanding: Handles text and image inputs for tasks like image analysis and visual data extraction.
Multilingual Support: Trained on data including over 140 languages.
Diverse Task Performance: Excels in text generation, question answering, summarization, and reasoning.
Resource-Efficient: Its relatively small size makes it suitable for deployment in environments with limited resources, such as laptops or desktops.

Training and Evaluation

The 1B model was trained on 2 trillion tokens, encompassing web documents, code, mathematics, and images. Data underwent rigorous filtering for CSAM and sensitive information. Evaluation benchmarks demonstrate its capabilities across reasoning, STEM, code, and multilingual tasks, with notable improvements in safety categories compared to previous Gemma models. The model was trained using TPUs with JAX and ML Pathways.

Intended Usage

This model is well-suited for content creation (text generation, chatbots), research (NLP/VLM research, language learning tools), and knowledge exploration. It offers high performance for its size, designed for responsible AI development.