Name: google/medgemma-4b-it API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Rating: 5.0 (1 reviews)
Author: google

MedGemma-4b-it: Specialized Multimodal AI for Healthcare

MedGemma-4b-it is a 4.3 billion parameter instruction-tuned model from Google, built upon the Gemma 3 architecture and specifically optimized for healthcare AI applications. It integrates a SigLIP image encoder, pre-trained on a wide array of de-identified medical images such as chest X-rays, dermatology, ophthalmology, and histopathology slides. The model's language component is trained on diverse medical text and question-answer pairs, enabling robust comprehension and generation capabilities in clinical contexts.

Key Capabilities

Multimodal Medical Comprehension: Processes both medical text and images, including radiology, dermatology, and pathology. Images are normalized to 896x896 resolution and encoded to 256 tokens.
Enhanced Medical Performance: Significantly outperforms the base Gemma 3 4B model across various medical benchmarks, including medical image classification (e.g., MIMIC CXR macro F1 88.9), visual question answering (e.g., SLAKE Tokenized F1 72.3), and text-only medical reasoning (e.g., MedQA 64.4).
Text Generation for Healthcare: Optimized for applications requiring text generation, such as chest X-ray report generation, achieving a RadGraph F1 of 30.3 when tuned for CXR.
Long Context Support: Features a context length of at least 128K tokens, allowing for extensive input processing.

Good for

Developers building healthcare-based AI applications that require both text and image understanding.
Tasks such as medical visual question answering, medical report generation, and medical text analysis.
Fine-tuning for specific clinical use cases using proprietary data to achieve improved performance.
Applications where a strong baseline in medical image and text comprehension is crucial for models of its size.