ddvd233/QoQ-Med3-VL-8B-MIMIC
The ddvd233/QoQ-Med3-VL-8B-MIMIC is an 8 billion parameter multimodal clinical foundation model, based on Qwen3-VL-8B, specifically fine-tuned with additional data from MIMIC-IV. It is designed for joint reasoning across diverse clinical data types including medical images, time-series signals, and text reports. This model utilizes Domain-aware Relative Policy Optimization (DRPO) to balance learning across 9 clinical domains, making it particularly effective for complex clinical decision-making and diagnostic support.
Loading preview...
QoQ-Med3-VL-8B-MIMIC: A Multimodal Clinical Foundation Model
QoQ-Med3-VL-8B-MIMIC is an 8 billion parameter multimodal clinical foundation model developed by ddvd233, building upon the Qwen3-VL-8B architecture and further fine-tuned with MIMIC-IV data. This model is engineered to perform joint reasoning across heterogeneous clinical data, including 2D/3D medical images, time-series signals like ECGs, and textual reports. A core innovation is its training with Domain-aware Relative Policy Optimization (DRPO), a novel reinforcement learning objective that addresses performance imbalances caused by skewed clinical data distributions by hierarchically scaling rewards based on domain rarity and modality difficulty.
Key Capabilities
- Multimodal Integration: Processes and reasons over 1D, 2D, and 3D clinical data types.
- Domain-Aware Training: DRPO ensures balanced learning across 9 diverse clinical domains, including Cardiology, Radiology, Dermatology, Ophthalmology, Pathology, and Mammography.
- Enhanced Interpretability: Generates reasoning traces and highlights salient regions in images, achieving significantly higher IoU compared to other open models.
- Strong Performance: Outperforms existing open-source clinical MLLMs in diagnostic performance, with DRPO boosting macro-F1 by 43% on average across visual domains.
Good for
- Clinical Decision Support: Assisting in complex diagnostic tasks requiring integration of various data modalities.
- Medical Research: Exploring multimodal reasoning in clinical contexts and developing new AI applications for healthcare.
- Interpretability Studies: Leveraging its ability to generate reasoning traces and highlight relevant image regions for understanding AI decisions in medicine.