MedGemma 1.5 4B IT: Specialized Multimodal AI for Healthcare

MedGemma 1.5 4B IT, developed by Google, is a 4.3 billion parameter multimodal instruction-tuned model built upon the Gemma 3 architecture. It is specifically designed and trained for advanced medical text and image comprehension, making it a powerful foundation for healthcare AI applications. This updated version significantly expands its capabilities beyond MedGemma 1, offering improved accuracy in medical text reasoning and 2D image interpretation.

Key Capabilities

High-dimensional Medical Imaging: Interprets 3D CT and MRI volumes.
Whole-slide Histopathology (WSI): Processes multiple patches from WSI as input.
Longitudinal Medical Imaging: Interprets chest X-rays in context of prior images.
Anatomical Localization: Performs bounding box-based localization in chest X-rays.
Medical Document Understanding: Extracts structured data from unstructured lab reports.
Electronic Health Record (EHR) Understanding: Interprets text-based EHR data.
Multimodal Training: Utilizes a SigLIP image encoder pre-trained on diverse de-identified medical data (X-rays, dermatology, ophthalmology, histopathology) and an LLM component trained on medical text, Q&A, FHIR-based EHR, and various medical images.

Good For

Developers building healthcare-based AI applications that involve text generation from medical inputs.
Fine-tuning on proprietary medical datasets for specific tasks.
Visual question answering related to medical images.
Medical document understanding and EHR interpretation.

Overview

MedGemma 1.5 4B IT: Specialized Multimodal AI for Healthcare

Key Capabilities

Good For

Full Model Card (README)