BramVanroy/Llama-2-13b-chat-dutch
BramVanroy/Llama-2-13b-chat-dutch is a 13 billion parameter Llama 2-based conversational language model developed by Bram Vanroy, fine-tuned specifically for Dutch language tasks. It was trained on a collection of synthetic Dutch instruction and chat datasets with a 4096-token context length. This model aims to improve Dutch language generation and conversational abilities, particularly excelling in Dutch-specific prompts and programming assistance.
Loading preview...
Overview
BramVanroy/Llama-2-13b-chat-dutch is a 13 billion parameter language model developed by Bram Vanroy, specifically fine-tuned for Dutch conversational tasks. It builds upon a pre-trained Llama 2 13B checkpoint that was further trained on Dutch data, then fine-tuned on a collection of synthetic Dutch instruction and chat datasets. The model operates with a 4096-token context length.
Key Capabilities
- Dutch Language Proficiency: Designed to produce Dutch text and engage in Dutch conversations, addressing the limited Dutch output of the original Llama 2 13B.
- Conversational AI: Fine-tuned on chat datasets to handle conversational prompts effectively.
- Programming Assistance: Shows decent performance in assisting with programming tasks.
- Safety Prompting: Incorporates a default system message during training to encourage helpful, respectful, and honest responses, aiming to mitigate harmful content.
Limitations and Considerations
- Synthetic Data Training: The model was trained on synthetic data translated using OpenAI's API, which limits its use for creating competitive products against OpenAI.
- No Human Feedback: Lacks human feedback training and safeguards, meaning it may produce unexpected or offensive content.
- Performance: While capable, the creator notes it was developed with limited compute and data, and suggests more powerful alternatives like Mistral-based GEITje 7B Ultra for better performance.
Training Details
The model was trained using LoRA targeting "q_proj" and "v_proj" in 4-bit, with Flash Attention. The training involved a learning rate of 0.0002 over 2 epochs, with a total batch size of 64 across 4 GPUs. Dialogs were kept intact within batches during preprocessing.