MedGemma 1.5 4B IT — Cataract Surgical Analysis

This model is a specialized fine-tuned version of Google's MedGemma 1.5 4B IT, specifically designed for the analysis of cataract surgery video frames. It integrates a Chain-of-Thought (CoT) approach to deliver expert-level reasoning and safety instructions, making it distinct from general-purpose models.

Key Capabilities

Expert-level Reasoning: Provides a detailed "Thinking Process" that analyzes surgical phases, identifies instrument-anatomy relationships, and assesses safety margins within surgical video frames.
Actionable Instructions: Generates a "Final Answer" with clear, concise instructions suitable for a surgical resident.
Structured Output: Delivers responses in a consistent format, separating the reasoning trace from the final instruction.
Specialized Training: Fine-tuned on the Cataract-1K dataset, a component of the LMOD benchmark, with reasoning traces distilled from Qwen3-VL-30B-A3B-Thinking.

Good for

Research in Medical AI: Ideal for exploring multimodal AI capabilities within surgical domains.
AI-assisted Surgical Training: Serves as a prototype for educational systems aimed at surgical residents.
Interpreting Surgical Video: Excels at providing detailed analysis and guidance based on individual frames from cataract surgery videos.

This model was fine-tuned using LoRA with 4bit-nf4-double_quant quantization and achieved stable learning, with evaluation loss converging to 0.19–0.24 and token accuracy reaching ~0.94 in the best-performing fold.

Overview

MedGemma 1.5 4B IT — Cataract Surgical Analysis

Key Capabilities

Good for

Full Model Card (README)