ClinicDx1/ClinicDx
ClinicDx1/ClinicDx is a fine-tuned multimodal clinical decision support (CDS) model based on Google's MedGemma 4B Instruct. It integrates a retrieval-augmented knowledge base and an audio input pathway for voice-driven clinical observation extraction. This 4.3 billion parameter model generates structured, evidence-grounded clinical assessments from patient presentations, deployable offline on consumer hardware. It is optimized for clinical decision support for healthcare professionals, providing structured differential diagnoses and treatment planning with KB citations.
Loading preview...
ClinicDx V1: Multimodal Clinical Decision Support
ClinicDx V1 is an open-source, trimodal inference system designed for edge clinical AI, developed by ClinicDx1. It combines a medical ASR encoder, a learned audio projector, and a fine-tuned 4.3 billion parameter clinical LLM (based on google/medgemma-4b-it) into a single llama.cpp binary, enabling fully offline deployment on consumer hardware.
Key Capabilities
- Structured Clinical Assessments: Generates 6-section responses including Alert Level, Clinical Assessment, Differential Considerations, Recommended Actions, Safety Alerts, and Key Points.
- Retrieval-Augmented Generation (RAG): Integrates a knowledge base (
who_knowledge_vec_v2.mv2) of 27,860 chunks from WHO and MSF clinical guidelines, using a multi-turn ReAct loop for dynamic retrieval. - Voice-Driven Input: Features a MedASR encoder and a lightweight AudioProjector (11.8M trainable parameters) to process audio inputs, enabling voice-to-CDS inference by reusing Gemma3's image token injection mechanism.
- Offline Deployment: Designed for full offline operation, making it suitable for low-resource clinical settings.
Good For
- Clinical Decision Support: Assisting trained healthcare professionals with structured differential diagnosis generation and evidence-grounded treatment planning.
- Voice-Driven Observation Extraction: Extracting clinical observations from audio inputs in environments where manual data entry is challenging.
- Research and Development: Providing an open-source platform for further research into multimodal clinical AI, particularly for edge computing applications.
Limitations: The model lacks formal clinical validation, is English-only, and its accuracy on real-world or noisy audio may vary as the audio projector was trained on synthetic speech. It is not intended for direct patient-facing use or autonomous clinical decision-making.