Metin/LLaMA-3-8B-Instruct-TR-DPO

Warm
Public
8B
FP8
8192
License: llama3
Hugging Face
Overview

Model Overview

Metin/LLaMA-3-8B-Instruct-TR-DPO is an 8 billion parameter instruction-tuned model, building upon the Meta-LLaMA-3-8B-Instruct base. Developed by Metin, its primary differentiation lies in its specialized fine-tuning for the Turkish language, utilizing a synthetically generated preference dataset of 10,000 samples. This process, conducted over 3 hours on a single RTX 6000 Ada GPU with QLoRA configurations (lora_r: 64, lora_alpha: 32, lora_dropout: 0.05), aims to significantly improve the quality and naturalness of Turkish outputs.

Key Capabilities

  • Enhanced Turkish Fluency: Generates more coherent and natural-sounding text in Turkish.
  • Improved Content Quality: Delivers more informative and detailed answers for Turkish instructions.
  • Preference-Tuned: Optimized to produce more 'likable' and preferable outputs based on direct preference optimization (DPO) training.

Performance Benchmarks (OpenLLMTurkishLeaderboard_v0.2)

  • MMLU_TR_V0.2: 49.83%
  • Truthful_QA_TR_V0.2: 52.32%
  • ARC_TR_V0.2: 44.37%
  • HellaSwag_TR_V0.2: 45.58%
  • GSM8K_TR_V0.2: 54.21%
  • Winogrande_TR_V0.2: 55.06%
  • Average: 50.22%

When to Use This Model

This model is particularly well-suited for applications requiring high-quality, fluent, and detailed text generation in Turkish. While not necessarily 'smarter' than its base model, its DPO fine-tuning makes its outputs more agreeable and contextually appropriate for Turkish users. Developers should consider this model for Turkish-centric chatbots, content generation, or any task where nuanced and natural Turkish language is critical.