robinsmits/Qwen1.5-7B-Dutch-Chat-Sft-Bf16

Cold
Public
7.7B
FP8
32768
Mar 26, 2024
License: cc-by-nc-4.0
Hugging Face
Overview

Overview

This model, robinsmits/Qwen1.5-7B-Dutch-Chat-Sft-Bf16, is a 7.7 billion parameter language model based on the Qwen1.5 architecture, specifically fine-tuned for Dutch conversational AI. It was developed by robinsmits through a supervised fine-tuning (SFT) process using the comprehensive BramVanroy/ultrachat_200k_dutch dataset.

Key Capabilities

  • Dutch Language Proficiency: Specialized in understanding and generating natural, conversational Dutch.
  • Chat-Optimized: Fine-tuned on a chat-specific dataset, making it suitable for dialogue systems and interactive applications.
  • Qwen1.5 Base: Benefits from the robust architecture and capabilities of the Qwen1.5 series.

Training Details

The model was trained using Google Colab PRO on an A100 - 40GB GPU. The training involved splitting the dataset into two parts due to session limits, with 'resume_from_checkpoint' used to ensure continuous learning. Key hyperparameters included a learning rate of 0.0003, a batch size of 2 (with 32 gradient accumulation steps), and a cosine learning rate scheduler. The training achieved a validation loss of 1.1756 over 1460 steps.

Intended Uses & Limitations

This model is designed for applications requiring high-quality Dutch conversational responses. However, like all LLMs, it may exhibit biases and hallucinations, necessitating thorough testing and validation for any use case. It's important to note that the dataset used for training does not permit commercial usage.