boods/EnToFrMedicaLLM-Multilingual
The boods/EnToFrMedicaLLM-Multilingual model, developed by Brice Donald Abodo Eloundou and Valentin Malykh, is a 14 billion parameter Qwen3-based decoder specifically adapted for French medical question answering. It leverages domain-adaptive continual pre-training on a large French health corpus, followed by multi-task LoRA fine-tuning across three QA formats. This model demonstrates statistically significant improvements over the un-adapted Qwen3-14B baseline in French medical QA tasks, making it suitable for specialized French medical language processing.
Loading preview...
EnMed-Unified: A Specialized French Medical LLM
EnMed-Unified is the flagship model of the EnMed family, built upon the Qwen3-14B architecture. Developed by Brice Donald Abodo Eloundou and Valentin Malykh, this 14 billion parameter model is uniquely designed for French medical question answering through a two-stage adaptation process. It first undergoes Domain-Adaptive Continual Pre-training (DAPT) on a comprehensive French health corpus, followed by multi-task LoRA fine-tuning across three distinct QA formats.
Key Capabilities
- French Medical Multiple-Choice QA: Excels at selecting the best answer from given options, similar to medical licensing exam questions.
- French Clinical Extractive QA: Proficient in identifying and extracting verbatim answer spans from French clinical case narratives.
- French Medical Abstractive QA: Capable of generating free-form answers to open-ended French medical questions.
Performance Highlights
Evaluation against the Qwen3-14B-vanilla baseline shows 4 statistically significant wins with zero significant losses across nine independent evaluation cells (task × shot combinations). EnMed-Unified consistently outperforms or matches the reference model in MCQA and ExtQA, demonstrating its robustness and specialization. It achieves a mean MCQA accuracy of 0.575 and ExtQA F1 of 0.529, leading in global descriptive ranking with the smallest standard deviation among evaluated models.
Intended Use Cases
This model is ideal for research and development in French medical language processing, particularly for applications requiring high-fidelity question answering in a medical context. Its multi-task training approach ensures broad applicability across different QA formats without the degradation seen in single-task specialized adapters.
Important Limitations
- Research Prototype: Not validated for real clinical use or patient-facing deployment. Do not use for clinical decision support.
- French-focused: While based on a multilingual model, its DAPT targets French, and English capabilities may have drifted. Not evaluated for other languages.
- Evaluation Nuances: Some statistical wins are suggestive rather than robust after correction, and single-judge evaluation for abstractive QA may introduce biases.