jpacifico/Chocolatine-3B-Instruct-DPO-Revised

Warm
Public
4B
BF16
4096
License: mit
Hugging Face
Overview

Chocolatine-3B-Instruct-DPO-Revised: A High-Performing 3B LLM

Chocolatine-3B-Instruct-DPO-Revised is a 3.82 billion parameter instruction-tuned model developed by Jonathan Pacifico, based on Microsoft's Phi-3-mini-4k-instruct. It leverages Direct Preference Optimization (DPO) using a French RLHF dataset, which not only enhances its French language capabilities but also improves its English performance beyond the base model.

Key Capabilities & Performance:

  • Multilingual Excellence: Outperforms GPT-3.5-Turbo on the MT-Bench-French benchmark, approaching the performance of Phi-3-Medium (14B) in French.
  • Leading 3B Model: Ranked as the best-performing 3B model on the OpenLLM Leaderboard (August 2024), surpassing even Microsoft's Phi-3.5-mini-instruct in average benchmarks for its category.
  • Robust General Benchmarks: Achieves an average score of 27.63 on the OpenLLM Leaderboard, with notable scores in IFEval (56.23), BBH (37.16), and MMLU-PRO (33.21).
  • Context Window: Features a 4k token context window.

Good for:

  • Applications requiring strong performance in both French and English.
  • Use cases where a compact yet powerful 3B parameter model is preferred for efficiency.
  • Developers seeking a highly-ranked model for general instruction-following tasks.