Overview
This model, llama2-13b-ft-mc4_nl_cleaned_tiny, is a fine-tuned variant of Meta's Llama-2-13b-hf, developed by Bram Vanroy. Its primary purpose is to enhance the fluency of the Llama 2 architecture in Dutch, rather than expanding its general knowledge base. The model was trained on the tiny partition of the yhavinga/mc4_nl_cleaned dataset for one epoch, utilizing a context window of 4096 tokens.
Key Capabilities
- Improved Dutch Fluency: Specifically fine-tuned to generate more natural and fluent Dutch text.
- Generative Tasks: Intended for various generative applications in Dutch.
- Further Fine-tuning: Can serve as a base for additional fine-tuning on tasks such as summarization, adaptation, or instruction-following.
Training Details
The model was trained using LoRA (Low-Rank Adaptation) targeting q_proj and v_proj layers in 4-bit quantization, with adapters merged before upload. Flash Attention was incorporated during training. The training involved a learning rate of 0.0003, a total batch size of 1152, and a cosine learning rate scheduler over one epoch. Validation loss decreased from 1.8820 to 1.7676 during training.
Open LLM Leaderboard Evaluation
On the Open LLM Leaderboard, the model achieved an average score of 46.81. Notable scores include:
- ARC (25-shot): 59.3
- HellaSwag (10-shot): 82.04
- MMLU (5-shot): 54.67
- TruthfulQA (0-shot): 38.03
Good for
- Applications requiring high-quality Dutch text generation.
- Developers looking for a Llama 2 base model with enhanced Dutch language capabilities.
- Research and development in Dutch NLP, particularly for generative tasks.