BramVanroy/fietje-2

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:3BQuant:BF16Ctx Length:2kPublished:Apr 9, 2024License:mitArchitecture:Transformer0.0K Open Weights Warm

Fietje 2 is a 2.7 billion parameter causal language model developed by Bram Vanroy, adapted from Microsoft's Phi-2 architecture. It is specifically fine-tuned for Dutch text generation, having been continuously pretrained on 28 billion Dutch tokens. This efficient model offers performance comparable to larger Dutch LLMs, making it suitable for various Dutch language processing tasks.

Loading preview...

Fietje 2: An Efficient Dutch LLM

Fietje 2, developed by Bram Vanroy, is a 2.7 billion parameter language model based on the microsoft/phi-2 architecture. It has been specifically adapted and continuously pretrained on 28 billion Dutch tokens, including a significant portion of Dutch Wikipedia and CulturaX data, to excel in Dutch text generation.

Key Capabilities & Features

  • Dutch Language Proficiency: Optimized for generating and understanding Dutch text through extensive pretraining on a high-quality Dutch dataset.
  • Efficiency: Despite its relatively small size (2.7B parameters), Fietje 2 demonstrates performance nearly on par with larger Dutch LLMs, such as GEITje 7B Ultra, offering a more efficient solution.
  • Open-source Foundation: Built upon the phi-2 model, inheriting its architectural strengths.
  • Multiple Versions: Available in base, instruct, and chat variants to suit different application needs.

Training Details

Fietje 2 was trained for approximately two weeks using 16 A100 80GB GPUs, leveraging the alignment-handbook and DeepSpeed. The training involved a learning rate of 9e-05 and a total batch size of 1920, focusing on achieving high data quality through careful filtering of the training corpus.

Intended Uses & Limitations

Fietje 2 is designed for Dutch language generation tasks. Users should be aware of general LLM limitations, including the potential for hallucinations and inaccuracies, similar to its base model, phi-2.