Overview
Chocolatine-3B-Instruct-DPO-Revised: A High-Performing 3B LLM
Chocolatine-3B-Instruct-DPO-Revised is a 3.82 billion parameter instruction-tuned model developed by Jonathan Pacifico, based on Microsoft's Phi-3-mini-4k-instruct. It leverages Direct Preference Optimization (DPO) using a French RLHF dataset, which not only enhances its French language capabilities but also improves its English performance beyond the base model.
Key Capabilities & Performance:
- Multilingual Excellence: Outperforms GPT-3.5-Turbo on the MT-Bench-French benchmark, approaching the performance of Phi-3-Medium (14B) in French.
- Leading 3B Model: Ranked as the best-performing 3B model on the OpenLLM Leaderboard (August 2024), surpassing even Microsoft's Phi-3.5-mini-instruct in average benchmarks for its category.
- Robust General Benchmarks: Achieves an average score of 27.63 on the OpenLLM Leaderboard, with notable scores in IFEval (56.23), BBH (37.16), and MMLU-PRO (33.21).
- Context Window: Features a 4k token context window.
Good for:
- Applications requiring strong performance in both French and English.
- Use cases where a compact yet powerful 3B parameter model is preferred for efficiency.
- Developers seeking a highly-ranked model for general instruction-following tasks.