BramVanroy/fietje-2-chat

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:3BQuant:BF16Ctx Length:2kPublished:Apr 29, 2024License:mitArchitecture:Transformer0.0K Open Weights Warm

BramVanroy/fietje-2-chat is a 2.7 billion parameter DPO-tuned chat model, adapted from Microsoft's Phi-2 architecture and specifically tailored for Dutch text generation. It was trained on 28 billion tokens of Dutch data, making it an efficient and specialized large language model for Dutch language tasks. This model performs comparably to larger Dutch LLMs, offering a compact solution for conversational applications in Dutch.

Loading preview...

Fietje 2 Chat: An Efficient Dutch LLM

Fietje 2 Chat is a 2.7 billion parameter language model developed by Bram Vanroy, specifically optimized for the Dutch language. It is a DPO-tuned (aligned) chat version, building upon the instruct model, which itself is an adaptation of Microsoft's Phi-2 architecture.

Key Capabilities & Features

  • Dutch Language Specialization: Tailored for Dutch text generation through training on 28 billion tokens of Dutch data.
  • Efficiency: Despite its small size (2.7B parameters), Fietje 2 Chat performs nearly on par with Dutch LLMs twice its size, such as GEITje 7B Ultra.
  • DPO-Tuned: Fine-tuned using Direct Preference Optimization (DPO) on a combination of cleaned Dutch datasets, including ultra_feedback_dutch_cleaned and orca_dpo_pairs_dutch_cleaned, totaling over 18,000 samples.
  • Chat-Optimized: Designed for conversational applications, offering an aligned model for interactive use cases.

Training Details

The model was trained with the alignment-handbook using DeepSpeed, leveraging computational resources from the Flemish Supercomputer Center (VSC). Training involved specific hyperparameters like a learning rate of 2e-06 and a batch size of 8, over one epoch.

Intended Uses & Limitations

Fietje 2 Chat is intended for Dutch language generation and conversational AI. Users should be aware of general LLM limitations, including potential for hallucinations and inaccuracies, as noted for the base Phi-2 model.