BramVanroy/fietje-2-chat

TEXT GENERATIONConcurrency Cost:1Model Size:3BQuant:BF16Ctx Length:2kPublished:Apr 29, 2024License:mitArchitecture:Transformer0.0K Open Weights Cold

BramVanroy/fietje-2-chat is a 2.7 billion parameter DPO-tuned chat model, adapted from Microsoft's Phi-2 architecture. It is specifically tailored for Dutch text generation, having been trained on 28 billion tokens of Dutch data. This model offers an efficient solution for Dutch language applications, performing comparably to larger Dutch LLMs at half their size. Its primary strength lies in generating high-quality, aligned Dutch conversational text.

Loading preview...

Fietje 2 Chat: An Efficient Dutch LLM

Fietje 2 Chat is a 2.7 billion parameter language model developed by Bram Vanroy, specifically optimized for the Dutch language. It is a DPO-tuned (aligned) chat version, building upon the instruct variant of Fietje, which itself is an adaptation of Microsoft's Phi-2 architecture.

Key Capabilities & Differentiators

  • Dutch Language Specialization: Fietje 2 Chat was extensively trained on 28 billion tokens of Dutch text, making it highly proficient in generating Dutch content.
  • Efficiency: Despite its compact size of 2.7 billion parameters, Fietje 2 Chat demonstrates performance nearly on par with Dutch LLMs twice its size, such as GEITje 7B Ultra.
  • Alignment: This model is DPO-tuned, indicating an emphasis on generating helpful and harmless responses, making it suitable for conversational applications.
  • Open Source: The model and its training details are openly available, with a thorough description and usage examples provided in its Github repository.

Training Details

Fietje 2 Chat was fine-tuned from the instruct model using a combination of two cleaned Dutch DPO datasets: BramVanroy/ultra_feedback_dutch_cleaned and BramVanroy/orca_dpo_pairs_dutch_cleaned, totaling 18,653 samples. Training was conducted using the alignment-handbook with DeepSpeed, leveraging computational resources from the Flemish Supercomputer Center (VSC).

Intended Uses

This model is ideal for applications requiring efficient and high-quality Dutch conversational text generation. Users should be aware of general LLM limitations, including the potential for hallucinations and errors.