unsloth/medgemma-4b-it

Warm
Public
Vision
4.3B
BF16
32768
May 20, 2025
License: other
Hugging Face
Overview

MedGemma 4B Instruction-Tuned Model

MedGemma 4B IT is a 4.3 billion parameter multimodal model from Google, built upon the Gemma 3 architecture and specifically optimized for healthcare AI applications. It integrates a SigLIP image encoder, pre-trained on a wide array of de-identified medical images including chest X-rays, dermatology, ophthalmology, and histopathology, alongside an LLM component trained on diverse medical text and question-answer pairs.

Key Capabilities

  • Multimodal Medical Comprehension: Processes both medical text and images (normalized to 896x896 resolution) to generate text outputs.
  • Specialized Medical Training: Significantly outperforms base Gemma 3 4B on medical image classification, visual question answering, and text-only medical benchmarks.
  • Report Generation: Demonstrates strong performance in generating chest X-ray reports, with fine-tuning capabilities to improve accuracy against specific ground truths.
  • Long Context Support: Supports a context length of at least 128K tokens for comprehensive input.

Good For

  • Developing Healthcare AI Applications: Serves as an efficient starting point for applications requiring medical text and image understanding.
  • Medical Text Generation: Ideal for tasks involving generating text responses, analyses, or summaries from medical inputs.
  • Visual Question Answering: Excels at answering questions based on medical images across various modalities.
  • Fine-tuning: Designed to be fine-tuned by developers with proprietary data for specific medical tasks, offering strong baseline performance for adaptation.