jpacifico/Chocolatine-14B-Instruct-DPO-v1.2

TEXT GENERATIONConcurrency Cost:1Model Size:14.7BQuant:FP8Ctx Length:32kPublished:Aug 12, 2024License:mitArchitecture:Transformer0.0K Open Weights Cold

Chocolatine-14B-Instruct-DPO-v1.2 is a 14.7 billion parameter instruction-tuned causal language model developed by jpacifico, fine-tuned from Microsoft's Phi-3-medium-4k-instruct. This model utilizes DPO fine-tuning with a French RLHF dataset, which enhances its performance in both French and English, surpassing its base model. With a 32K context length, it excels in conversational tasks and demonstrates strong performance on the OpenLLM Leaderboard for its size class.

Loading preview...

Chocolatine-14B-Instruct-DPO-v1.2 Overview

Chocolatine-14B-Instruct-DPO-v1.2 is a 14.7 billion parameter instruction-tuned language model developed by jpacifico. It is a DPO (Direct Preference Optimization) fine-tune of the microsoft/Phi-3-medium-4k-instruct base model, leveraging the jpacifico/french-orca-dpo-pairs-revised RLHF dataset. A notable characteristic is that its training in French also leads to improved performance in English, often surpassing its base model.

Key Capabilities & Performance

  • Multilingual Proficiency: Demonstrates strong performance in both French and English, as evidenced by its outperformance of the base model.
  • OpenLLM Leaderboard: As of October 18, 2024, Chocolatine-14B-Instruct-DPO-v1.2 was the best-performing model in the 13B size category on the OpenLLM Leaderboard, achieving an average score of 33.3.
  • MT-Bench-French: Outperforms its previous versions and the Phi-3-medium-4k-instruct base model on the MT-Bench-French benchmark, particularly in conversational turns.
  • Context Window: Features a 4K token context window, suitable for handling moderately long interactions.

Usage & Limitations

This model is available in a 4-bit quantized GGUF version and can be run via Ollama. It serves as a demonstration of effective fine-tuning for compelling performance. However, it does not include any built-in moderation mechanisms. The model is licensed under MIT.