ddvd233/QoQ-Med3-VL-8B-MIMIC

VISIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Dec 16, 2025License:mitArchitecture:Transformer Open Weights Cold

The ddvd233/QoQ-Med3-VL-8B-MIMIC is an 8 billion parameter multimodal clinical foundation model, based on Qwen3-VL-8B, specifically fine-tuned with additional data from MIMIC-IV. It is designed for joint reasoning across diverse clinical data types including medical images, time-series signals, and text reports. This model utilizes Domain-aware Relative Policy Optimization (DRPO) to balance learning across 9 clinical domains, making it particularly effective for complex clinical decision-making and diagnostic support.

Loading preview...

QoQ-Med3-VL-8B-MIMIC: A Multimodal Clinical Foundation Model

QoQ-Med3-VL-8B-MIMIC is an 8 billion parameter multimodal clinical foundation model developed by ddvd233, building upon the Qwen3-VL-8B architecture and further fine-tuned with MIMIC-IV data. This model is engineered to perform joint reasoning across heterogeneous clinical data, including 2D/3D medical images, time-series signals like ECGs, and textual reports. A core innovation is its training with Domain-aware Relative Policy Optimization (DRPO), a novel reinforcement learning objective that addresses performance imbalances caused by skewed clinical data distributions by hierarchically scaling rewards based on domain rarity and modality difficulty.

Key Capabilities

  • Multimodal Integration: Processes and reasons over 1D, 2D, and 3D clinical data types.
  • Domain-Aware Training: DRPO ensures balanced learning across 9 diverse clinical domains, including Cardiology, Radiology, Dermatology, Ophthalmology, Pathology, and Mammography.
  • Enhanced Interpretability: Generates reasoning traces and highlights salient regions in images, achieving significantly higher IoU compared to other open models.
  • Strong Performance: Outperforms existing open-source clinical MLLMs in diagnostic performance, with DRPO boosting macro-F1 by 43% on average across visual domains.

Good for

  • Clinical Decision Support: Assisting in complex diagnostic tasks requiring integration of various data modalities.
  • Medical Research: Exploring multimodal reasoning in clinical contexts and developing new AI applications for healthcare.
  • Interpretability Studies: Leveraging its ability to generate reasoning traces and highlight relevant image regions for understanding AI decisions in medicine.