UMCU/CardioLlama.nl_clinical

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:1BQuant:BF16Ctx Length:32kPublished:Oct 29, 2025License:llama3.2Architecture:Transformer Warm

UMCU/CardioLlama.nl_clinical is a 1 billion parameter Llama-3.2-Instruct model developed by UMCU, specifically optimized for the Dutch medical domain. It underwent continuous pre-training on a Dutch medical corpus, with a particular focus on cardiology, and has a context length of 32768 tokens. This model is designed for applications requiring specialized understanding of Dutch medical and cardiology-related texts.

Loading preview...

Overview

UMCU/CardioLlama.nl_clinical is a 1 billion parameter Llama-3.2-Instruct model that has undergone extensive domain-adapted pretraining (DAPT), also known as continuous pre-training (CPT), on a specialized Dutch medical corpus. This process involved training for one full epoch on the general Dutch medical corpus, followed by further pre-training on 5 million cardiology records from the UMCU, mixed with a random selection from the Dutch medical corpus to prevent model collapse. The model maintains a context length of 32768 tokens.

Key Capabilities

  • Specialized Medical Understanding: Deeply adapted to the Dutch medical language, with a bias towards cardiology.
  • Domain-Specific Pre-training: Utilizes continuous pre-training on a large, relevant dataset to enhance performance in its target domain.
  • Robustness: Incorporates strategies like mixing datasets during pre-training to avoid model collapse.

Good For

  • Dutch Medical Text Analysis: Ideal for tasks involving the processing and understanding of medical documents written in Dutch.
  • Cardiology-Specific Applications: Particularly well-suited for use cases within the cardiology domain, given its focused pre-training.
  • Research and Development: A strong foundation for further fine-tuning or research in Dutch clinical NLP.