wololoo/Llama-3.2-3B-TR-Instruct-DPO is a 3.2 billion parameter causal language model developed by wololoo, fine-tuned from unsloth/Llama-3.2-3B-Instruct. This model is specifically optimized for Turkish language capabilities and enhanced reasoning skills in STEM fields, utilizing a two-stage SFT and DPO training pipeline. It serves as a general-purpose Turkish assistant, excelling in Turkish question-answering and providing foundational information in science, technology, and engineering topics, with a context length of 32768 tokens.
No reviews yet. Be the first to review!