unsloth/gemma-3-1b-pt

Warm
Public
1B
BF16
32768
License: gemma
Hugging Face
Overview

Gemma 3 1B Pre-trained Model Overview

This model is a 1 billion parameter variant from the Gemma 3 family, developed by Google DeepMind. It is a lightweight, state-of-the-art open model built using the same research and technology as the Gemini models. Gemma 3 models are multimodal, capable of processing both text and image inputs to generate text outputs. The 1B model features a 32K token context window, while larger Gemma 3 models offer up to 128K tokens and multilingual support across over 140 languages.

Key Capabilities

  • Multimodal Input: Accepts text strings and images (normalized to 896x896 resolution, encoded to 256 tokens each).
  • Text Generation: Excels at generating creative text formats, powering chatbots, and summarizing documents.
  • Image Understanding: Capable of extracting, interpreting, and summarizing visual data.
  • Reasoning & Factuality: Demonstrates performance across various benchmarks like HellaSwag, BoolQ, and PIQA.
  • STEM & Code: Shows capabilities in MMLU, GSM8K, and HumanEval benchmarks.
  • Multilingual Support: Trained on data including content in over 140 languages, with specific multilingual benchmarks like MGSM and Global-MMLU-Lite.
  • Resource Efficiency: Its relatively small size allows for deployment in environments with limited resources, such as laptops or desktops.

Good For

  • Content Creation: Generating diverse text formats, marketing copy, and email drafts.
  • Conversational AI: Developing chatbots and virtual assistants.
  • Research & Education: Serving as a foundation for NLP and VLM research, language learning tools, and knowledge exploration.
  • Image Analysis: Tasks involving image data extraction and visual question answering (VQA) as shown in benchmarks like COCOcap and DocVQA.