BramVanroy/fietje-2

Warm
Public
3B
BF16
2048
License: mit
Hugging Face
Overview

Fietje 2: An Efficient Dutch LLM

Fietje 2 is a 2.7 billion parameter language model developed by BramVanroy, building upon the microsoft/phi-2 architecture. Its primary distinction is its specialization in the Dutch language, achieved through continuous pretraining on a substantial dataset of 28 billion Dutch tokens. This dataset includes content from the Dutch Wikipedia and a filtered subset of CulturaX, ensuring high data quality.

Key Capabilities

  • Dutch Text Generation: Optimized for generating coherent and contextually relevant text in Dutch.
  • Efficiency: Designed to be small and efficient, offering performance comparable to Dutch LLMs of twice its size, such as GEITje 7B Ultra.
  • Open-source Base: Leverages the robust phi-2 architecture, providing a strong foundation for further development.

Training Details

Fietje 2 was trained over approximately two weeks using the alignment-handbook and DeepSpeed, utilizing 16 A100 80GB GPUs provided by the Flemish Supercomputer Center (VSC). The training involved specific hyperparameters including a learning rate of 9e-05 and a total batch size of 1920 over 1.0 epoch.

Intended Uses & Limitations

This model is suitable for various Dutch language processing tasks where efficiency is crucial. However, it shares the inherent limitations of LLMs, including the potential for hallucination and errors. Users should exercise caution and validate outputs, especially in critical applications.

Good for

  • Applications requiring efficient Dutch language generation.
  • Researchers and developers working on Dutch NLP tasks.
  • Use cases where a smaller model footprint is advantageous without significant performance compromise for Dutch content.