EPFLiGHT/Apertus-8B-MeditronFO
EPFLiGHT/Apertus-8B-MeditronFO is an 8-billion parameter medical specialist large language model developed by EPFLiGHT. It is fine-tuned from Apertus-8B-Instruct using the Fully Open Meditron Corpus, designed for clinical applications. This model is part of the Fully Open Meditron family, emphasizing an auditable pipeline for clinical LLMs with open weights, data, and training. It demonstrates significant improvements on aggregate medical benchmarks, making it suitable for medical research and auditing clinical AI systems.
Loading preview...
Overview
EPFLiGHT/Apertus-8B-MeditronFO is an 8-billion parameter medical specialist LLM, developed by EPFLiGHT. It is built upon the Apertus-8B-Instruct base model and fine-tuned using the comprehensive Fully Open Meditron Corpus. This model is a key component of the Fully Open Meditron initiative, which aims to provide an end-to-end auditable pipeline for clinical LLMs, featuring open weights, data, and training methodologies.
Key Capabilities & Performance
- Medical Specialization: Achieves a notable +13.35 point improvement over its base model on aggregate medical benchmarks, representing the largest gain within the MeditronFO family.
- Benchmark Performance: Demonstrates strong performance across various medical benchmarks, including MedMCQA, MedQA, PubMedQA, MedXpertQA, and HealthBench Hard.
- Auditable Pipeline: Emphasizes transparency with open weights, data, and training recipes, facilitating research and auditing of clinical AI systems.
- Training Details: Fine-tuned on approximately 601k examples (~150M tokens) from the Fully Open Meditron corpus, which aggregates public medical QA datasets and clinician-vetted synthetic components.
Intended Use
- Research Only: Primarily intended for research purposes related to medical LLMs, auditing clinical AI systems, and ensuring reproducibility of the Fully Open Meditron pipeline.
- Not for Clinical Deployment: It is explicitly not validated for clinical deployment, individual patient advice, autonomous decision-making, or any other deployment-adjacent use without independent domain-specific safety evaluation.