Name: ChocoLlama/Llama-3-ChocoLlama-8B-instruct API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: ChocoLlama

ChocoLlama/Llama-3-ChocoLlama-8B-instruct Overview

This model is an 8 billion parameter instruction-tuned variant from the ChocoLlama family, developed by Matthieu Meeus and Anthony Rathé. It is built upon Meta's Llama-3-8B architecture, specifically adapted and optimized for the Dutch language. The base model, Llama-3-ChocoLlama-8B-base, was fine-tuned on 32 billion Dutch Llama-2 tokens (104GB) using LoRa, and this instruction-tuned version underwent further alignment through Supervised Fine-Tuning (SFT) and Direct Preference Optimization (DPO).

Key Capabilities

Dutch Language Proficiency: Specifically adapted and fine-tuned for high-quality Dutch language understanding and generation.
Instruction Following: Instruction-tuned using SFT and DPO on Dutch translations of various instruction datasets, making it suitable for conversational AI.
Strong Performance: Achieves an average score of 0.53 on a suite of Dutch benchmarks (ARC, HellaSwag, MMLU, TruthfulQA), surpassing other prominent Dutch models in its class.
Llama-3 Architecture: Benefits from the robust capabilities of the Llama-3 base model.

Good for

Dutch Conversational AI: Ideal for chatbots, virtual assistants, and interactive applications requiring fluent and contextually appropriate Dutch responses.
Dutch Text Generation: Generating various forms of Dutch text, from creative writing to informative content.
Research in Dutch LLMs: Serves as a strong baseline or component for further research and development in Dutch natural language processing.

Note: The instruction-tuning datasets were translated using GPT-3.5/4, which restricts this specific instruction-tuned model from commercial use. For commercial applications, fine-tuning the base model on proprietary Dutch data is recommended. For more details, refer to the ChocoLlama paper.

Overview

ChocoLlama/Llama-3-ChocoLlama-8B-instruct Overview

Key Capabilities

Good for

Full Model Card (README)