unsloth/medgemma-4b-pt

Cold
Public
Vision
4.3B
BF16
32768
License: health-ai-developer-foundations
Hugging Face
Overview

MedGemma 4B: Medical Multimodal AI

MedGemma 4B, developed by Google, is a 4.3 billion parameter multimodal model built upon the Gemma 3 architecture, specifically designed for healthcare AI applications. It integrates a SigLIP image encoder pre-trained on a wide array of de-identified medical images, including radiology, histopathology, ophthalmology, and dermatology, alongside an LLM component trained on diverse medical text.

Key Capabilities

  • Multimodal Comprehension: Processes both medical text and images, enabling tasks like visual question answering and report generation.
  • Specialized Medical Training: Pre-trained on extensive medical datasets, including MIMIC-CXR, Slake-VQA, and various proprietary datasets, enhancing its understanding of clinical contexts.
  • Strong Baseline Performance: Outperforms the base Gemma 3 4B model across numerous medical benchmarks, including medical image classification (e.g., 88.9 F1 on MIMIC CXR), visual question answering (e.g., 62.3 F1 on SlakeVQA), and chest X-ray report generation (29.5 RadGraph F1).
  • Fine-tuning Ready: Available in pre-trained (-pt) and instruction-tuned (-it) versions, providing flexibility for developers to fine-tune for specific use cases.

Intended Use

MedGemma 4B is intended as a foundational model for developers in the life sciences and healthcare to build downstream AI applications. It is particularly well-suited for:

  • Developing tools for medical image analysis and interpretation.
  • Creating systems for medical visual question answering.
  • Generating preliminary medical reports or summaries from images and text.
  • Accelerating research and development in medical AI by providing a robust, medically-aware starting point.