ddvd233/QoQ-Med3-VL-8B
The ddvd233/QoQ-Med3-VL-8B is an 8 billion parameter multimodal clinical foundation model, built upon Qwen3-VL-8B, designed for reasoning across heterogeneous clinical data with a 32768 token context length. Developed by ddvd233, it integrates medical images, time-series signals, and text reports, utilizing Domain-aware Relative Policy Optimization (DRPO) to balance performance across 9 clinical domains. This model excels at diagnostic tasks and provides enhanced interpretability by highlighting salient regions, making it suitable for advanced clinical research applications.
Loading preview...
QoQ-Med3-VL-8B: Multimodal Clinical Foundation Model
QoQ-Med3-VL-8B is an 8 billion parameter multimodal clinical foundation model, based on Qwen3-VL-8B, developed by ddvd233. It is specifically designed for complex clinical decision-making by jointly reasoning across diverse data types, including medical images (2D/3D), time-series signals (like ECG), and text reports. The model leverages a novel training approach called Domain-aware Relative Policy Optimization (DRPO), a reinforcement learning objective that addresses performance imbalances in skewed clinical data distributions by scaling rewards based on domain rarity and modality difficulty.
Key Capabilities
- Heterogeneous Data Integration: Processes and reasons across 1D, 2D, and 3D clinical data.
- Domain-Aware Training: DRPO ensures balanced learning across 9 distinct clinical domains, including Cardiology, Radiology, Dermatology, Ophthalmology, Pathology, and Mammography.
- Enhanced Interpretability: Generates reasoning traces and highlights salient regions related to diagnoses, achieving significantly higher IoU compared to other open models.
- Diagnostic Performance: Boosts diagnostic performance by an average of 43% in macro-F1 across visual domains compared to other critic-free training methods.
Use Cases
- Clinical Research: Ideal for research into advanced diagnostic support systems and multimodal data analysis in medicine.
- Medical Image Analysis: Capable of interpreting various medical imaging modalities and providing relevant insights.
- Multimodal Clinical Reasoning: Suitable for tasks requiring integrated analysis of different clinical data sources to inform decisions.
This model is intended for research purposes only and is not suitable for clinical deployment without extensive real-world testing.