Name: Rijgersberg/GEITje-7B API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: Rijgersberg

GEITje-7B: A Specialized Dutch Language Model

GEITje-7B is a 7 billion parameter large language model developed by Rijgersberg, specifically enhanced for the Dutch language. It is built upon the robust Mistral 7B base model, known for its strong performance on English benchmarks, even outperforming larger models like Llama 2 13B.

Key Capabilities & Training:

Dutch Language Specialization: GEITje-7B underwent extensive further training on 10 billion tokens of Dutch text.
Comprehensive Datasets: Training data included the Dutch Gigacorpus and the MADLAD-400 web crawling corpus.
Full-Parameter Finetune: Unlike PEFT or LoRA methods, this model was finetuned across all its parameters, ensuring deep integration of Dutch language patterns.
Context Length: It retains the 8192-token context window of its Mistral base.

Ideal Use Cases:

Applications requiring high proficiency in the Dutch language.
Tasks benefiting from Dutch cultural and topical knowledge.
Developers seeking a powerful, open-source model for Dutch NLP projects.

Overview

GEITje-7B: A Specialized Dutch Language Model

Key Capabilities & Training:

Ideal Use Cases:

Full Model Card (README)