Overview

This model, distil_MedGemma_4B_Llama-3.2-1B, is a 1.24 billion parameter medical language model created through knowledge distillation. It was developed by Mohamed Abo El-Enen, Sally Saad, and Taymoor Nazmy, and is detailed in their 2025 IEEE paper, "DistilLLM-Med: A Lightweight Medical Language Model through Knowledge Distillation." The model distills medical expertise from the larger MedGemma-4B (a 4.97B parameter teacher) into a smaller Llama-3.2-1B student base.

Key Capabilities & Features

Efficient Medical NLP: Achieves 47.7% average accuracy on MMLU-Medical, retaining 89.3% of teacher token-level accuracy while reducing parameters by 75% compared to the teacher.
High Inference Throughput: Operates at 59.5 tokens/second, making it 42.1% faster than its MedGemma-4B teacher.
Cross-Tokenizer Distillation: Utilizes a unique learnable vocabulary-projection layer to distill knowledge between models with different tokenizers (Gemma's 262K vocabulary to Llama-3's 128K vocabulary), preserving 98.7% of teacher information.
Advanced Distillation Techniques: Employs temperature-scaled KL-divergence, progressive temperature scheduling, specialty-weighted losses, and attention-map alignment.
Comprehensive Training Data: Trained on a unified corpus of 1.64 million samples from 18 established medical benchmarks, including MMLU (medical subtasks), PubMedQA, and clinical dialogues.

Intended Use & Limitations

Research Focus: Primarily intended for research in efficient and lightweight medical NLP, particularly for studying cross-tokenizer/cross-architecture knowledge distillation.
Not for Clinical Use: The model is not a certified clinical tool; expert review found critical factual errors in approximately 21% of sampled answers, necessitating qualified human oversight for any outputs.
Research Checkpoint: Represents a research checkpoint with ~0.5 epoch of training, not a fully converged production model.
Inherited Biases: May inherit biases from its teacher model (MedGemma-4B) and the training corpus.

Overview

Overview

Key Capabilities & Features

Intended Use & Limitations

Full Model Card (README)