google/medgemma-4b-it
Hugging Face
VISIONConcurrency Cost:1Model Size:4.3BQuant:BF16Ctx Length:32kPublished:May 19, 2025License:otherArchitecture:Transformer0.9K Gated Warm

MedGemma-4b-it is a 4.3 billion parameter instruction-tuned variant of Google's Gemma 3 model, specifically trained for performance on medical text and image comprehension. It utilizes a SigLIP image encoder pre-trained on diverse de-identified medical data, including chest X-rays, dermatology, ophthalmology, and histopathology images. This multimodal model excels at medical applications involving text generation, visual question answering, and report generation, outperforming base Gemma 3 models on clinically relevant benchmarks. It supports a long context length of at least 128K tokens for comprehensive medical data processing.

Loading preview...

MedGemma-4b-it: Specialized Multimodal AI for Healthcare

MedGemma-4b-it is a 4.3 billion parameter instruction-tuned model from Google, built upon the Gemma 3 architecture and specifically optimized for healthcare AI applications. It integrates a SigLIP image encoder, pre-trained on a wide array of de-identified medical images such as chest X-rays, dermatology, ophthalmology, and histopathology slides. The model's language component is trained on diverse medical text and question-answer pairs, enabling robust comprehension and generation capabilities in clinical contexts.

Key Capabilities

  • Multimodal Medical Comprehension: Processes both medical text and images, including radiology, dermatology, and pathology. Images are normalized to 896x896 resolution and encoded to 256 tokens.
  • Enhanced Medical Performance: Significantly outperforms the base Gemma 3 4B model across various medical benchmarks, including medical image classification (e.g., MIMIC CXR macro F1 88.9), visual question answering (e.g., SLAKE Tokenized F1 72.3), and text-only medical reasoning (e.g., MedQA 64.4).
  • Text Generation for Healthcare: Optimized for applications requiring text generation, such as chest X-ray report generation, achieving a RadGraph F1 of 30.3 when tuned for CXR.
  • Long Context Support: Features a context length of at least 128K tokens, allowing for extensive input processing.

Good for

  • Developers building healthcare-based AI applications that require both text and image understanding.
  • Tasks such as medical visual question answering, medical report generation, and medical text analysis.
  • Fine-tuning for specific clinical use cases using proprietary data to achieve improved performance.
  • Applications where a strong baseline in medical image and text comprehension is crucial for models of its size.
Popular Sampler Settings

Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.

temperature
top_p
top_k
frequency_penalty
presence_penalty
repetition_penalty
min_p