google/gemma-3-4b-pt

Warm
Public
Vision
4.3B
BF16
32768
Feb 20, 2025
License: gemma
Hugging Face
Gated
Overview

Gemma 3 4.3B Pre-trained Model Overview

Google DeepMind's Gemma 3 is a family of lightweight, multimodal open models, leveraging the same research and technology as the Gemini models. This 4.3 billion parameter pre-trained variant is designed for both text and image input, generating text output, and features a substantial 32,768-token context window. It offers multilingual support across more than 140 languages.

Key Capabilities

  • Multimodal Input: Processes both text strings (questions, prompts, documents) and images (normalized to 896x896 resolution, encoded to 256 tokens each).
  • Text Generation: Generates diverse text outputs, including answers, image content analysis, and document summaries.
  • Extensive Context: Supports a total input context of 32,768 tokens for this size, enabling processing of longer and more complex inputs.
  • Multilingual Support: Trained on web documents in over 140 languages, enhancing its global applicability.
  • Reasoning & STEM: Demonstrates strong performance across various reasoning, STEM, and code benchmarks, including MMLU, MATH, and HumanEval.

Intended Use Cases

  • Content Creation: Ideal for generating creative text formats, marketing copy, and email drafts.
  • Conversational AI: Suitable for powering chatbots, virtual assistants, and interactive applications.
  • Information Extraction: Can extract, interpret, and summarize visual data for text communications.
  • Research & Education: Serves as a foundation for VLM and NLP research, language learning tools, and knowledge exploration.