BramVanroy/Llama-2-13b-chat-dutch

TEXT GENERATIONConcurrency Cost:1Model Size:13BQuant:FP8Ctx Length:4kPublished:Aug 14, 2023Architecture:Transformer0.0K Cold

BramVanroy/Llama-2-13b-chat-dutch is a 13 billion parameter Llama 2-based conversational language model developed by Bram Vanroy, fine-tuned specifically for Dutch language tasks. It was trained on a collection of synthetic Dutch instruction and chat datasets with a 4096-token context length. This model aims to improve Dutch language generation and conversational abilities, particularly excelling in Dutch-specific prompts and programming assistance.

Loading preview...

Overview

BramVanroy/Llama-2-13b-chat-dutch is a 13 billion parameter language model developed by Bram Vanroy, specifically fine-tuned for Dutch conversational tasks. It builds upon a pre-trained Llama 2 13B checkpoint that was further trained on Dutch data, then fine-tuned on a collection of synthetic Dutch instruction and chat datasets. The model operates with a 4096-token context length.

Key Capabilities

  • Dutch Language Proficiency: Designed to produce Dutch text and engage in Dutch conversations, addressing the limited Dutch output of the original Llama 2 13B.
  • Conversational AI: Fine-tuned on chat datasets to handle conversational prompts effectively.
  • Programming Assistance: Shows decent performance in assisting with programming tasks.
  • Safety Prompting: Incorporates a default system message during training to encourage helpful, respectful, and honest responses, aiming to mitigate harmful content.

Limitations and Considerations

  • Synthetic Data Training: The model was trained on synthetic data translated using OpenAI's API, which limits its use for creating competitive products against OpenAI.
  • No Human Feedback: Lacks human feedback training and safeguards, meaning it may produce unexpected or offensive content.
  • Performance: While capable, the creator notes it was developed with limited compute and data, and suggests more powerful alternatives like Mistral-based GEITje 7B Ultra for better performance.

Training Details

The model was trained using LoRA targeting "q_proj" and "v_proj" in 4-bit, with Flash Attention. The training involved a learning rate of 0.0002 over 2 epochs, with a total batch size of 64 across 4 GPUs. Dialogs were kept intact within batches during preprocessing.