Avicenna-8B-Base: A Specialized Medical LLM
Avicenna-8B-Base is the foundational model of the Avicenna Project, developed by salihfurkaan, specifically designed for clinical reasoning at the 8 billion parameter scale. It achieves its specialized capabilities through a unique "surgical merge" architecture, which differs significantly from standard model merging techniques.
Key Architectural Innovation: The Surgical Merge
Unlike uniform merges, Avicenna-8B-Base employs a layer-segmented DARE-TIES configuration. This assigns distinct cognitive roles to different parts of the network by merging specific layers from Llama-3.1-Instruct, Hermes-3, and Aloe-Beta. This structure is designed to prevent catastrophic forgetting of general logic while injecting massive medical knowledge into deep layers.
Benchmark Performance
The model demonstrates strong performance on medical benchmarks, including MedQA (USMLE), MMLU-Medical, and MedMCQA. Notably, it achieves 61.0% on MedQA (USMLE) and 69.5% on MMLU-Medical when utilizing its recommended Self-Consistency Ensembling (SC) inference strategy. This performance is competitive with, and in some cases surpasses, larger medical models and even approaches GPT-3.5 Turbo on certain metrics, especially considering its 8B parameter size.
Recommended Inference Strategy
For optimal results, Avicenna-8B-Base is designed to be used with a Self-Consistency Ensembling inference method, which involves generating multiple drafts and then synthesizing them into a final consensus. A Python script is provided to implement this "Mixture-of-Agents" approach for open-ended clinical queries.
Intended Use
This model is intended for academic research, benchmarking, and decision-support prototyping in medical contexts. It is crucial to note that it is not a substitute for professional medical advice and should never be used for real-world patient diagnosis or treatment without human supervision due to the potential for hallucinations.