GEITje 7B Ultra: A Conversational Model for Dutch
GEITje 7B Ultra is a 7 billion parameter instruction/chat model developed by Bram Vanroy, specifically designed for the Dutch language. It is built upon the Mistral architecture and has undergone alignment through AI feedback using Direct Preference Optimization (DPO).
Key Capabilities & Features
- Dutch Language Specialization: Fine-tuned extensively on Dutch data, making it highly proficient in generating and understanding Dutch text.
- AI Feedback Alignment: Utilizes a synthetic DPO dataset of around 56 million tokens, generated with GPT-4-Turbo and Rijgersberg/GEITje-7B-chat, to enhance conversational quality and alignment.
- Conversational Model: Optimized for interactive chat and instruction-following in Dutch, supporting system messages through the Zephyr chat template.
- Performance: In Dutch-specific benchmarks, it generally outperforms earlier GEITje models, though these benchmarks are noted to have limitations.
Training Details
The model is a DPO continuation of BramVanroy/GEITje-7B-ultra-sft, which was further pretrained on Dutch data based on Rijgersberg/GEITje-7B. It was trained in full (without LoRA) using bfloat16 and Flash Attention 2 on A100 GPUs. A hyperparameter search led to a beta value of 0.1 for DPO, which improved results compared to default settings.
Intended Uses & Limitations
GEITje 7B Ultra is suitable for conversational AI applications requiring strong Dutch language capabilities. However, users should be aware that due to its training on synthetic data created with OpenAI/Azure services, this model cannot be used for commercial purposes. It may also generate incorrect, misleading, or potentially offensive content, and should be used at your own risk.