Model Overview
Metin/LLaMA-3-8B-Instruct-TR-DPO is an 8 billion parameter instruction-tuned model, building upon the Meta-LLaMA-3-8B-Instruct base. Developed by Metin, its primary differentiation lies in its specialized fine-tuning for the Turkish language, utilizing a synthetically generated preference dataset of 10,000 samples. This process, conducted over 3 hours on a single RTX 6000 Ada GPU with QLoRA configurations (lora_r: 64, lora_alpha: 32, lora_dropout: 0.05), aims to significantly improve the quality and naturalness of Turkish outputs.
Key Capabilities
- Enhanced Turkish Fluency: Generates more coherent and natural-sounding text in Turkish.
- Improved Content Quality: Delivers more informative and detailed answers for Turkish instructions.
- Preference-Tuned: Optimized to produce more 'likable' and preferable outputs based on direct preference optimization (DPO) training.
Performance Benchmarks (OpenLLMTurkishLeaderboard_v0.2)
- MMLU_TR_V0.2: 49.83%
- Truthful_QA_TR_V0.2: 52.32%
- ARC_TR_V0.2: 44.37%
- HellaSwag_TR_V0.2: 45.58%
- GSM8K_TR_V0.2: 54.21%
- Winogrande_TR_V0.2: 55.06%
- Average: 50.22%
When to Use This Model
This model is particularly well-suited for applications requiring high-quality, fluent, and detailed text generation in Turkish. While not necessarily 'smarter' than its base model, its DPO fine-tuning makes its outputs more agreeable and contextually appropriate for Turkish users. Developers should consider this model for Turkish-centric chatbots, content generation, or any task where nuanced and natural Turkish language is critical.