jpacifico/Chocolatine-2-14B-Instruct-v2.0.3

Warm
Public
14.8B
FP8
131072
Feb 6, 2025
License: apache-2.0
Hugging Face
Overview

Chocolatine-2-14B-Instruct-v2.0.3 Overview

Chocolatine-2-14B-Instruct-v2.0.3 is a 14.8 billion parameter instruction-tuned model developed by Jonathan Pacifico, based on the Qwen-2.5-14B architecture. It has been fine-tuned using DPO (Direct Preference Optimization) with the jpacifico/french-orca-dpo-pairs-revised RLHF dataset, specifically enhancing its capabilities in the French language. The model supports a substantial context window of up to 128K tokens.

Key Capabilities & Performance

  • French Language Proficiency: Achieves top 3 rankings across all categories on the French Government Leaderboard LLM FR.
  • Benchmark Excellence: Recognized as the strongest open-weights model on the COLE Benchmark (Laval University) with a Composite Score of 45.05%. Details are available in this paper.
  • MT-Bench-French Performance: Outperforms previous Chocolatine versions and the base Qwen-2.5 model on MT-Bench-French, closely approaching GPT-4o-mini's performance in French.
  • Multilingual Support: While primarily optimized for French, it also supports English.

Good For

  • Applications requiring high-quality French language generation and understanding.
  • Developers seeking a powerful open-source model for French-centric tasks.
  • Use cases benefiting from a large context window (up to 128K tokens).

Limitations

This model series serves as a demonstration of effective fine-tuning. It does not include any built-in moderation mechanisms.