UMCU/CardioLlama.nl_clinical

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:1BQuant:BF16Ctx Length:32kPublished:Oct 29, 2025License:llama3.2Architecture:Transformer Warm

UMCU/CardioLlama.nl_clinical is a Llama-3.2-1B-Instruct model that has undergone domain-adapted continuous pre-training on a Dutch medical corpus, with a specific bias towards cardiology. This 1 billion parameter model is optimized for medical language understanding in Dutch, particularly within the cardiology domain, making it suitable for specialized clinical applications. Its training regimen included a significant phase on 5 million cardiology records from UMCU, ensuring high relevance and accuracy for Dutch medical texts.

Loading preview...

Overview

UMCU/CardioLlama.nl_clinical is a specialized language model based on the Llama-3.2-1B-Instruct architecture, featuring 1 billion parameters. Its core distinction lies in its extensive domain-adapted pre-training (DAPT), also known as Continuous Pre-training (CPT), on a comprehensive Dutch medical corpus. This training process was specifically biased towards cardiology, making it highly relevant for medical applications in this field.

Key Capabilities

  • Domain-Specific Understanding: Excels in comprehending and generating text within the Dutch medical domain, particularly cardiology.
  • Continuous Pre-training: Underwent an initial full epoch of training on a general Dutch medical corpus, followed by further pre-training on 5 million cardiology records from UMCU.
  • Perplexity: Achieved a perplexity of approximately 4 on the validation set, indicating strong language modeling performance within its specialized domain.

Good for

  • Dutch Medical NLP: Ideal for tasks requiring deep understanding or generation of Dutch medical text.
  • Cardiology Applications: Particularly well-suited for use cases within the cardiology domain due to its biased training data.
  • Research: Useful for researchers exploring domain adaptation techniques in LLMs for specialized medical fields.

If you use this model, please cite the associated work:

@misc{vanes2026languagecorporadutchmedical,
      title={Language corpora for the Dutch medical domain}, 
      author={B. van Es},
      year={2026},
      eprint={2604.25374},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2604.25374}, 
}