MohamedAhmedAE/distil_med42_8B_Llama-3.2-1B-Instruct

TEXT GENERATIONConcurrency Cost:1Model Size:1BQuant:BF16Ctx Length:32kTool Calling:SupportedPublished:Jun 9, 2026License:llama3.2Architecture:Transformer Cold

MohamedAhmedAE/distil_med42_8B_Llama-3.2-1B-Instruct is a 1.24 billion parameter instruction-tuned medical chat model, distilled from the 8.03 billion parameter Llama3-Med42-8B teacher into a Llama-3.2-1B-Instruct student base. Developed by Mohamed Abo El-Enen et al., this model leverages knowledge distillation techniques to create a lightweight medical assistant with a 32768 token context length. It is optimized for conversational medical queries, retaining 89.3% of the teacher's token-level accuracy while significantly reducing parameter count for efficient deployment.

Loading preview...

Overview

MohamedAhmedAE/distil_med42_8B_Llama-3.2-1B-Instruct is a 1.24 billion parameter instruction-tuned medical chat model, developed by Mohamed Abo El-Enen et al. and published in the 2025 IEEE ICICIS conference. It is a distilled version of the 8.03 billion parameter Llama3-Med42-8B teacher model, built upon a Llama-3.2-1B-Instruct student base. This model is designed as a compact medical assistant, maintaining a usable chat template for conversational prompting.

Key Capabilities & Distillation Method

  • Knowledge Distillation: Utilizes temperature-scaled KL-divergence, specialty-weighted losses, and attention-map alignment to transfer medical expertise from a larger teacher model.
  • Efficiency: Achieves 89.3% of the teacher's token-level accuracy while reducing parameters by 75%, resulting in a lightweight model suitable for resource-constrained environments.
  • Medical Specialization: Trained on a unified corpus of 18 medical benchmarks (1.64M samples), including MMLU medical subtasks, PubMedQA, and clinical dialogues.
  • Performance: Reaches 47.7% average accuracy on MMLU-Medical, representing a 20.5% relative improvement over the base LLaMA 3.2-1B, with an inference speed of 59.5 tokens/sec.

Intended Use & Limitations

  • Use Cases: Ideal for research in efficient medical chat assistants, knowledge-distillation studies, and edge/low-resource deployment experiments.
  • Critical Limitation: The model is not a certified clinical tool. Expert review found critical factual errors in approximately 21% of sampled answers, necessitating qualified human oversight for any output. It should be treated as a research checkpoint, not a production-ready diagnostic tool.