BramVanroy/GEITje-7B-ultra

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:8kPublished:Jan 27, 2024License:cc-by-nc-4.0Architecture:Transformer0.1K Open Weights Cold

BramVanroy/GEITje-7B-ultra is a 7 billion parameter conversational model for Dutch, based on Mistral and aligned through AI feedback using Direct Preference Optimization (DPO). It is fine-tuned on a synthetic DPO dataset of approximately 56 million tokens generated with GPT-4-Turbo. This model is specifically designed for Dutch language instruction and chat, outperforming previous GEITje models in Dutch-specific benchmarks.

Loading preview...

GEITje 7B Ultra: A Conversational Model for Dutch

GEITje 7B Ultra is a 7 billion parameter instruction/chat model developed by Bram Vanroy, specifically designed for the Dutch language. It is built upon the Mistral architecture and has undergone alignment through AI feedback using Direct Preference Optimization (DPO).

Key Capabilities & Features

  • Dutch Language Specialization: Fine-tuned extensively on Dutch data, making it highly proficient in generating and understanding Dutch text.
  • AI Feedback Alignment: Utilizes a synthetic DPO dataset of around 56 million tokens, generated with GPT-4-Turbo and Rijgersberg/GEITje-7B-chat, to enhance conversational quality and alignment.
  • Conversational Model: Optimized for interactive chat and instruction-following in Dutch, supporting system messages through the Zephyr chat template.
  • Performance: In Dutch-specific benchmarks, it generally outperforms earlier GEITje models, though these benchmarks are noted to have limitations.

Training Details

The model is a DPO continuation of BramVanroy/GEITje-7B-ultra-sft, which was further pretrained on Dutch data based on Rijgersberg/GEITje-7B. It was trained in full (without LoRA) using bfloat16 and Flash Attention 2 on A100 GPUs. A hyperparameter search led to a beta value of 0.1 for DPO, which improved results compared to default settings.

Intended Uses & Limitations

GEITje 7B Ultra is suitable for conversational AI applications requiring strong Dutch language capabilities. However, users should be aware that due to its training on synthetic data created with OpenAI/Azure services, this model cannot be used for commercial purposes. It may also generate incorrect, misleading, or potentially offensive content, and should be used at your own risk.

Popular Sampler Settings

Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.

temperature
top_p
top_k
frequency_penalty
presence_penalty
repetition_penalty
min_p