Thestral-0.1-tr-chat-7B: Turkish-Optimized Chat Model
The Thestral-0.1-tr-chat-7B is a 7 billion parameter language model developed by NovusResearch, built upon the Mistral-7B-v0.1 architecture. Its primary distinction lies in its comprehensive fine-tuning on a wide array of Turkish datasets, making it highly proficient in Turkish language understanding and generation.
Key Capabilities & Training
- Turkish Language Specialization: The model has undergone full fine-tuning using translated Turkish datasets, including versions of
teknium/OpenHermes-2.5 and Open-Orca/SlimOrca, ensuring strong performance in Turkish conversational contexts. - Base Model: Utilizes
mistralai/Mistral-7B-v0.1 as its foundational architecture, inheriting its robust causal language modeling capabilities. - Training Configuration: Fine-tuned using
axolotl with a sequence length of 8192 and a learning rate of 0.000005 over 2 epochs, incorporating techniques like gradient checkpointing and Flash Attention.
Performance Metrics (Turkish Leaderboard)
Evaluated on the OpenLLMTurkishLeaderboard, the model demonstrates an average score of 36.41, with specific scores including:
- MMLU: 40.64
- TruthfulQA: 47.90
- Winogrande: 50.86
- AI2 Reasoning Challenge: 27.24
- HellaSwag: 33.90
- GSM8k: 17.91
Ideal Use Cases
- Turkish Chatbots: Excellent for developing conversational AI agents that interact in Turkish.
- Turkish Content Generation: Suitable for generating text, summaries, or creative content in Turkish.
- Research & Development: A strong base for further fine-tuning or research into Turkish NLP applications.