mahsum/jazari-4b-sft-tr
Jazari-4B-SFT-TR is a 4.5 billion parameter language model developed by Mahsum Aktas, adapted from the Qwen3.5-4B base model. It has been specifically optimized for Turkish language fluency and cultural knowledge through continued pre-training and supervised fine-tuning on Turkish text and examples. This model excels at structured output tasks, demonstrating 100% JSON compliance with proper prompting, making it highly suitable for real-time system monitoring and classification applications.
Loading preview...
Jazari-4B-SFT-TR: Turkish-Optimized Qwen3.5-4B
Jazari-4B-SFT-TR is a 4.5 billion parameter language model developed by Mahsum Aktas, built upon the Qwen3.5-4B architecture. It has undergone significant adaptation for the Turkish language through two key stages:
- Continued Pre-Training (CPT): Utilizing 674 MB of Turkish text over 11,939 steps.
- Supervised Fine-Tuning (SFT): Trained on 73,182 examples across 13 categories over 9,600 steps.
This process, completed on a single RTX 5090 GPU for approximately $75, has resulted in a model with enhanced Turkish fluency and cultural understanding.
Key Capabilities & Performance
Jazari-4B-SFT-TR demonstrates strong performance in Turkish benchmarks, significantly outperforming the base Qwen3.5-4B model in areas like TR-MMLU (85.0% vs 80.0%), TR-ARC (98.0% vs 84.0%), TR-TruthfulQA (95.0% vs 52.5%), and TR-HellaSwag (97.5% vs 47.5%).
Its most notable strength is its structured output reliability. When used with structured prompts and temperature=0, it achieves 100% JSON compliance, as proven in real-world deployments like the ailm system monitor for Linux log classification. In this application, it processes events at 1.2 seconds/event (11x faster than gpt-oss:20b) and uses only 2.7 GB VRAM.
Strengths
- Natural Turkish: Provides more fluent and culturally aware responses compared to its base model.
- Reliable Structured Output: Achieves 100% JSON compliance for classification tasks with appropriate prompting.
- Efficiency: Operates with low VRAM (2.7 GB) and high speed (0.6-1.2 seconds/request), suitable for real-time applications.
- Production-Proven: Actively deployed in the
ailmsystem for 24/7 monitoring.
Limitations
- Overconfidence: May generate incorrect answers rather than stating uncertainty.
- Translation: Not reliable for Turkish to English translation.
- Mathematics: Weaker than the base model on TR-GSM8K (52% vs 66%).
- Domain-Specific Classification: Requires further fine-tuning for highly specialized categories not present in its training data.