MohamedAhmedAE/distil_MedGemma_4B_Llama-3.2-1B
MohamedAhmedAE/distil_MedGemma_4B_Llama-3.2-1B is a 1.24 billion parameter medical language model distilled from Google's MedGemma-4B into a Llama-3.2-1B base. Developed by Mohamed Abo El-Enen et al. for the 2025 IEEE ICICIS conference, this model specializes in medical NLP tasks. It achieves 47.7% average accuracy on MMLU-Medical and demonstrates efficient inference at 59.5 tokens/second, making it suitable for lightweight medical applications and research into cross-architecture knowledge distillation.
Loading preview...
Overview
This model, distil_MedGemma_4B_Llama-3.2-1B, is a 1.24 billion parameter medical language model created through knowledge distillation. It was developed by Mohamed Abo El-Enen, Sally Saad, and Taymoor Nazmy, and is detailed in their 2025 IEEE paper, "DistilLLM-Med: A Lightweight Medical Language Model through Knowledge Distillation." The model distills medical expertise from the larger MedGemma-4B (a 4.97B parameter teacher) into a smaller Llama-3.2-1B student base.
Key Capabilities & Features
- Efficient Medical NLP: Achieves 47.7% average accuracy on MMLU-Medical, retaining 89.3% of teacher token-level accuracy while reducing parameters by 75% compared to the teacher.
- High Inference Throughput: Operates at 59.5 tokens/second, making it 42.1% faster than its MedGemma-4B teacher.
- Cross-Tokenizer Distillation: Utilizes a unique learnable vocabulary-projection layer to distill knowledge between models with different tokenizers (Gemma's 262K vocabulary to Llama-3's 128K vocabulary), preserving 98.7% of teacher information.
- Advanced Distillation Techniques: Employs temperature-scaled KL-divergence, progressive temperature scheduling, specialty-weighted losses, and attention-map alignment.
- Comprehensive Training Data: Trained on a unified corpus of 1.64 million samples from 18 established medical benchmarks, including MMLU (medical subtasks), PubMedQA, and clinical dialogues.
Intended Use & Limitations
- Research Focus: Primarily intended for research in efficient and lightweight medical NLP, particularly for studying cross-tokenizer/cross-architecture knowledge distillation.
- Not for Clinical Use: The model is not a certified clinical tool; expert review found critical factual errors in approximately 21% of sampled answers, necessitating qualified human oversight for any outputs.
- Research Checkpoint: Represents a research checkpoint with ~0.5 epoch of training, not a fully converged production model.
- Inherited Biases: May inherit biases from its teacher model (MedGemma-4B) and the training corpus.