MohamedAhmedAE/distil_llama_3_8B_Llama-3.2-1B
MohamedAhmedAE/distil_llama_3_8B_Llama-3.2-1B is a 1-billion parameter language model distilled from Meta-Llama-3-8B into a Llama-3.2-1B student, trained on a medical corpus. This model serves as a research control to evaluate the impact of a general-purpose teacher on medical domain knowledge distillation. It is designed for studying how medical competence arises from training on specialized text versus a teacher's inherent domain expertise, rather than as a standalone medical assistant.
Loading preview...
Overview
This model, distil_llama_3_8B_Llama-3.2-1B, is a 1-billion parameter student model distilled from the general-purpose Meta-Llama-3-8B teacher into a Llama-3.2-1B base. It was trained using the same medical corpus as other models in the "DistilLLM-Med" project, which focuses on creating lightweight medical language models through knowledge distillation. Unlike the primary experiments in the associated paper, this specific checkpoint uses a non-medically specialized teacher, making it a crucial control or ablation for research into the sources of medical domain competence in distilled models.
Key Distillation Methodologies
The distillation process leverages several techniques:
- Temperature-scaled KL-divergence distillation to transfer knowledge from the teacher's logits.
- Progressive temperature scheduling for a gradual knowledge transfer.
- Specialty-weighted loss to re-balance KL loss across medical subdomains, even with a general-purpose teacher.
- Attention-map alignment using Frobenius-norm loss to align student and teacher attention patterns.
Training Data
The model was trained on a comprehensive medical corpus, MohamedAhmedAE/Med_LLaMa3_fine-tuning_dataset, which merges 18 established medical benchmarks (e.g., MMLU-Medical, MedMCQA, PubMedQA). This dataset comprises approximately 1.64 million training samples, preprocessed for deduplication and token truncation.
Intended Use
This model is primarily intended as a research ablation/control checkpoint for investigating the effect of teacher domain expertise in medical knowledge distillation. It is not designed as a standalone medical assistant and is not validated for clinical decision-making.