jpacifico/Chocolatine-14B-Instruct-DPO-v1.2
Chocolatine-14B-Instruct-DPO-v1.2 is a 14.7 billion parameter instruction-tuned causal language model developed by jpacifico, fine-tuned from Microsoft's Phi-3-medium-4k-instruct. This model utilizes DPO fine-tuning with a French RLHF dataset, which enhances its performance in both French and English, surpassing its base model. With a 32K context length, it excels in conversational tasks and demonstrates strong performance on the OpenLLM Leaderboard for its size class.
Loading preview...
Chocolatine-14B-Instruct-DPO-v1.2 Overview
Chocolatine-14B-Instruct-DPO-v1.2 is a 14.7 billion parameter instruction-tuned language model developed by jpacifico. It is a DPO (Direct Preference Optimization) fine-tune of the microsoft/Phi-3-medium-4k-instruct base model, leveraging the jpacifico/french-orca-dpo-pairs-revised RLHF dataset. A notable characteristic is that its training in French also leads to improved performance in English, often surpassing its base model.
Key Capabilities & Performance
- Multilingual Proficiency: Demonstrates strong performance in both French and English, as evidenced by its outperformance of the base model.
- OpenLLM Leaderboard: As of October 18, 2024, Chocolatine-14B-Instruct-DPO-v1.2 was the best-performing model in the 13B size category on the OpenLLM Leaderboard, achieving an average score of 33.3.
- MT-Bench-French: Outperforms its previous versions and the Phi-3-medium-4k-instruct base model on the MT-Bench-French benchmark, particularly in conversational turns.
- Context Window: Features a 4K token context window, suitable for handling moderately long interactions.
Usage & Limitations
This model is available in a 4-bit quantized GGUF version and can be run via Ollama. It serves as a demonstration of effective fine-tuning for compelling performance. However, it does not include any built-in moderation mechanisms. The model is licensed under MIT.