MedGemma 1.5 4B IT ā Cataract Surgical Analysis
This model is a specialized fine-tuned version of Google's MedGemma 1.5 4B IT, specifically designed for the analysis of cataract surgery video frames. It integrates a Chain-of-Thought (CoT) approach to deliver expert-level reasoning and safety instructions, making it distinct from general-purpose models.
Key Capabilities
- Expert-level Reasoning: Provides a detailed "Thinking Process" that analyzes surgical phases, identifies instrument-anatomy relationships, and assesses safety margins within surgical video frames.
- Actionable Instructions: Generates a "Final Answer" with clear, concise instructions suitable for a surgical resident.
- Structured Output: Delivers responses in a consistent format, separating the reasoning trace from the final instruction.
- Specialized Training: Fine-tuned on the Cataract-1K dataset, a component of the LMOD benchmark, with reasoning traces distilled from Qwen3-VL-30B-A3B-Thinking.
Good for
- Research in Medical AI: Ideal for exploring multimodal AI capabilities within surgical domains.
- AI-assisted Surgical Training: Serves as a prototype for educational systems aimed at surgical residents.
- Interpreting Surgical Video: Excels at providing detailed analysis and guidance based on individual frames from cataract surgery videos.
This model was fine-tuned using LoRA with 4bit-nf4-double_quant quantization and achieved stable learning, with evaluation loss converging to 0.19ā0.24 and token accuracy reaching ~0.94 in the best-performing fold.