ChocoLlama/Llama-3-ChocoLlama-8B-instruct
ChocoLlama/Llama-3-ChocoLlama-8B-instruct Overview
This model is an 8 billion parameter instruction-tuned variant from the ChocoLlama family, developed by Matthieu Meeus and Anthony Rathé. It is built upon Meta's Llama-3-8B architecture, specifically adapted and optimized for the Dutch language. The base model, Llama-3-ChocoLlama-8B-base, was fine-tuned on 32 billion Dutch Llama-2 tokens (104GB) using LoRa, and this instruction-tuned version underwent further alignment through Supervised Fine-Tuning (SFT) and Direct Preference Optimization (DPO).
Key Capabilities
- Dutch Language Proficiency: Specifically adapted and fine-tuned for high-quality Dutch language understanding and generation.
- Instruction Following: Instruction-tuned using SFT and DPO on Dutch translations of various instruction datasets, making it suitable for conversational AI.
- Strong Performance: Achieves an average score of 0.53 on a suite of Dutch benchmarks (ARC, HellaSwag, MMLU, TruthfulQA), surpassing other prominent Dutch models in its class.
- Llama-3 Architecture: Benefits from the robust capabilities of the Llama-3 base model.
Good for
- Dutch Conversational AI: Ideal for chatbots, virtual assistants, and interactive applications requiring fluent and contextually appropriate Dutch responses.
- Dutch Text Generation: Generating various forms of Dutch text, from creative writing to informative content.
- Research in Dutch LLMs: Serves as a strong baseline or component for further research and development in Dutch natural language processing.
Note: The instruction-tuning datasets were translated using GPT-3.5/4, which restricts this specific instruction-tuned model from commercial use. For commercial applications, fine-tuning the base model on proprietary Dutch data is recommended. For more details, refer to the ChocoLlama paper.