ClinicDx V1: Multimodal Clinical Decision Support
ClinicDx V1 is an open-source, trimodal inference system designed for edge clinical AI, developed by ClinicDx1. It combines a medical ASR encoder, a learned audio projector, and a fine-tuned 4.3 billion parameter clinical LLM (based on google/medgemma-4b-it) into a single llama.cpp binary, enabling fully offline deployment on consumer hardware.
Key Capabilities
- Structured Clinical Assessments: Generates 6-section responses including Alert Level, Clinical Assessment, Differential Considerations, Recommended Actions, Safety Alerts, and Key Points.
- Retrieval-Augmented Generation (RAG): Integrates a knowledge base (
who_knowledge_vec_v2.mv2) of 27,860 chunks from WHO and MSF clinical guidelines, using a multi-turn ReAct loop for dynamic retrieval. - Voice-Driven Input: Features a MedASR encoder and a lightweight AudioProjector (11.8M trainable parameters) to process audio inputs, enabling voice-to-CDS inference by reusing Gemma3's image token injection mechanism.
- Offline Deployment: Designed for full offline operation, making it suitable for low-resource clinical settings.
Good For
- Clinical Decision Support: Assisting trained healthcare professionals with structured differential diagnosis generation and evidence-grounded treatment planning.
- Voice-Driven Observation Extraction: Extracting clinical observations from audio inputs in environments where manual data entry is challenging.
- Research and Development: Providing an open-source platform for further research into multimodal clinical AI, particularly for edge computing applications.
Limitations: The model lacks formal clinical validation, is English-only, and its accuracy on real-world or noisy audio may vary as the audio projector was trained on synthetic speech. It is not intended for direct patient-facing use or autonomous clinical decision-making.