unsloth/medgemma-1.5-4b-it

Warm
Public
Vision
4.3B
BF16
32768
Jan 14, 2026
License: other
Hugging Face
Overview

MedGemma 1.5 4B: Specialized Multimodal AI for Healthcare

MedGemma 1.5 4B, developed by Google, is a 4 billion parameter multimodal instruction-tuned model built upon the Gemma 3 architecture. It is specifically designed and trained for advanced comprehension of medical text and images, serving as a robust foundation for healthcare AI applications.

Key Capabilities

  • High-dimensional Medical Imaging: Interprets 3D CT and MRI volumes.
  • Whole-slide Histopathology (WSI): Processes multiple patches from whole slide images.
  • Longitudinal Medical Imaging: Analyzes sequences of images, such as comparing current and prior chest X-rays.
  • Anatomical Localization: Performs bounding box-based localization of features in chest X-rays.
  • Medical Document Understanding: Extracts structured data from unstructured medical lab reports.
  • Electronic Health Record (EHR) Interpretation: Understands text-based EHR data.
  • Improved Accuracy: Delivers enhanced accuracy in medical text reasoning and 2D image interpretation compared to its predecessor, MedGemma 1 4B.

Training and Architecture

MedGemma utilizes a SigLIP image encoder pre-trained on diverse de-identified medical data, including radiology, histopathology, ophthalmology, and dermatology images. The LLM component is trained on a broad spectrum of medical data, encompassing text, Q&A pairs, FHIR-based EHR data, and various 2D/3D medical images. It employs a decoder-only Transformer architecture with grouped-query attention and supports a long context of at least 128K tokens.

Good For

  • Developers building healthcare-based AI applications requiring medical image and text comprehension.
  • Fine-tuning for specific medical tasks using proprietary data.
  • Applications involving visual question answering, document understanding, and textual medical questions.