BramVanroy/fietje-2-chat

Warm
Public
3B
BF16
2048
Apr 29, 2024
License: mit
Hugging Face
Overview

Fietje 2 Chat: An Efficient Dutch LLM

Fietje 2 Chat is a 2.7 billion parameter language model developed by Bram Vanroy, specifically optimized for the Dutch language. It is a DPO-tuned (aligned) chat version, building upon the instruct model, which itself is an adaptation of Microsoft's Phi-2 architecture.

Key Capabilities & Features

  • Dutch Language Specialization: Tailored for Dutch text generation through training on 28 billion tokens of Dutch data.
  • Efficiency: Despite its small size (2.7B parameters), Fietje 2 Chat performs nearly on par with Dutch LLMs twice its size, such as GEITje 7B Ultra.
  • DPO-Tuned: Fine-tuned using Direct Preference Optimization (DPO) on a combination of cleaned Dutch datasets, including ultra_feedback_dutch_cleaned and orca_dpo_pairs_dutch_cleaned, totaling over 18,000 samples.
  • Chat-Optimized: Designed for conversational applications, offering an aligned model for interactive use cases.

Training Details

The model was trained with the alignment-handbook using DeepSpeed, leveraging computational resources from the Flemish Supercomputer Center (VSC). Training involved specific hyperparameters like a learning rate of 2e-06 and a batch size of 8, over one epoch.

Intended Uses & Limitations

Fietje 2 Chat is intended for Dutch language generation and conversational AI. Users should be aware of general LLM limitations, including potential for hallucinations and inaccuracies, as noted for the base Phi-2 model.