google/gemma-3-1b-pt

Warm
Public
1B
BF16
32768
License: gemma
Hugging Face
Gated
Overview

Overview

Gemma 3 is a family of lightweight, state-of-the-art open models from Google DeepMind, leveraging the same research and technology as the Gemini models. These models are multimodal, capable of processing both text and image inputs to generate text outputs. The Gemma 3 1B PT variant features a 32K token context window and supports over 140 languages.

Key Capabilities

  • Multimodal Input: Accepts text strings and images (normalized to 896x896 resolution, encoded to 256 tokens each).
  • Multilingual Support: Trained on data including content in over 140 languages.
  • Text Generation: Generates creative text formats, powers chatbots, and performs text summarization.
  • Image Understanding: Extracts, interprets, and summarizes visual data for text communications.
  • Reasoning & Factuality: Demonstrates performance across various reasoning benchmarks, including HellaSwag (62.3 on 10-shot) and BoolQ (63.2 on 0-shot).
  • STEM & Code: Shows capabilities in STEM and code-related tasks, with benchmarks like MMLU and HumanEval available for larger Gemma 3 models.

When to Use This Model

This model is particularly suitable for:

  • Resource-Constrained Environments: Its relatively small size (1 billion parameters) allows for deployment on laptops, desktops, or personal cloud infrastructure.
  • Text Generation Tasks: Ideal for content creation, chatbots, and summarization.
  • Image-to-Text Applications: Useful for tasks requiring analysis and summarization of image content.
  • Multilingual Applications: Benefits from broad language support for diverse user bases.
  • Research and Education: Serves as a foundation for VLM and NLP research, language learning tools, and knowledge exploration.