xaviergillard/digita

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Dec 24, 2025Architecture:Transformer Cold

Digita is an 8 billion parameter conversational language model developed by Xavier Gillard, fine-tuned from Llama-3.1-8b-Instruct with QLoRA. It is specifically optimized to respond like the Belgian State Archive customer support, trained on proprietary data from the "digit" mailbox of the DiVa section. This model serves as a proof-of-concept for a chatbot easing public access to archives, supporting Dutch, French, German, and English.

Loading preview...

Model Overview

xaviergillard/digita is an 8 billion parameter conversational language model, fine-tuned from unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit using QLoRA. Developed by Xavier Gillard and funded by BELSPO under the Arkey project, its primary purpose is to prototype a chatbot that can interact and respond to user requests in a manner consistent with the Belgian State Archive's customer support personnel. The model was trained on a proprietary dataset derived from the "digit" mailbox of the DiVa section, focusing on emulating human-like responses in an archival context.

Key Capabilities

  • Specialized Conversational AI: Fine-tuned to mimic the communication style of the Belgian State Archive's customer support.
  • Multilingual Support: Capable of processing and generating responses in Dutch, French, German, and English.
  • Proof-of-Concept: Demonstrates the feasibility of creating domain-specific chatbots for public institutions using fine-tuning techniques.

Limitations and Recommendations

Digita's training data spans the 2022-2024 period, meaning some information may be outdated, particularly regarding the AGATHA platform. The 8B parameter size is noted as a potential limitation for broader applications. The developer recommends re-finetuning with a larger model and a more recent dataset for serious deployment within the Belgian State Archive, highlighting this model as a strong proof-of-concept for future development. All training scripts and notebooks (excluding the dataset) are available on GitHub.