Name: MohamedAhmedAE/distil_llama_3_8B_Llama-3.2-1B API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: MohamedAhmedAE

Overview

This model, distil_llama_3_8B_Llama-3.2-1B, is a 1-billion parameter student model distilled from the general-purpose Meta-Llama-3-8B teacher into a Llama-3.2-1B base. It was trained using the same medical corpus as other models in the "DistilLLM-Med" project, which focuses on creating lightweight medical language models through knowledge distillation. Unlike the primary experiments in the associated paper, this specific checkpoint uses a non-medically specialized teacher, making it a crucial control or ablation for research into the sources of medical domain competence in distilled models.

Key Distillation Methodologies

The distillation process leverages several techniques:

Temperature-scaled KL-divergence distillation to transfer knowledge from the teacher's logits.
Progressive temperature scheduling for a gradual knowledge transfer.
Specialty-weighted loss to re-balance KL loss across medical subdomains, even with a general-purpose teacher.
Attention-map alignment using Frobenius-norm loss to align student and teacher attention patterns.

Training Data

The model was trained on a comprehensive medical corpus, MohamedAhmedAE/Med_LLaMa3_fine-tuning_dataset, which merges 18 established medical benchmarks (e.g., MMLU-Medical, MedMCQA, PubMedQA). This dataset comprises approximately 1.64 million training samples, preprocessed for deduplication and token truncation.

Intended Use

This model is primarily intended as a research ablation/control checkpoint for investigating the effect of teacher domain expertise in medical knowledge distillation. It is not designed as a standalone medical assistant and is not validated for clinical decision-making.

Overview

Overview

Key Distillation Methodologies

Training Data

Intended Use

Full Model Card (README)