mahsum/jazari-4b-sft-tr

VISIONConcurrency Cost:1Model Size:4.5BQuant:BF16Ctx Length:32kTool Calling:SupportedPublished:Mar 27, 2026License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

Jazari-4B-SFT-TR is a 4.5 billion parameter language model developed by Mahsum Aktas, adapted from the Qwen3.5-4B base model. It has been specifically optimized for Turkish language fluency and cultural knowledge through continued pre-training and supervised fine-tuning on Turkish text and examples. This model excels at structured output tasks, demonstrating 100% JSON compliance with proper prompting, making it highly suitable for real-time system monitoring and classification applications.

Loading preview...

Jazari-4B-SFT-TR: Turkish-Optimized Qwen3.5-4B

Jazari-4B-SFT-TR is a 4.5 billion parameter language model developed by Mahsum Aktas, built upon the Qwen3.5-4B architecture. It has undergone significant adaptation for the Turkish language through two key stages:

  • Continued Pre-Training (CPT): Utilizing 674 MB of Turkish text over 11,939 steps.
  • Supervised Fine-Tuning (SFT): Trained on 73,182 examples across 13 categories over 9,600 steps.

This process, completed on a single RTX 5090 GPU for approximately $75, has resulted in a model with enhanced Turkish fluency and cultural understanding.

Key Capabilities & Performance

Jazari-4B-SFT-TR demonstrates strong performance in Turkish benchmarks, significantly outperforming the base Qwen3.5-4B model in areas like TR-MMLU (85.0% vs 80.0%), TR-ARC (98.0% vs 84.0%), TR-TruthfulQA (95.0% vs 52.5%), and TR-HellaSwag (97.5% vs 47.5%).

Its most notable strength is its structured output reliability. When used with structured prompts and temperature=0, it achieves 100% JSON compliance, as proven in real-world deployments like the ailm system monitor for Linux log classification. In this application, it processes events at 1.2 seconds/event (11x faster than gpt-oss:20b) and uses only 2.7 GB VRAM.

Strengths

  • Natural Turkish: Provides more fluent and culturally aware responses compared to its base model.
  • Reliable Structured Output: Achieves 100% JSON compliance for classification tasks with appropriate prompting.
  • Efficiency: Operates with low VRAM (2.7 GB) and high speed (0.6-1.2 seconds/request), suitable for real-time applications.
  • Production-Proven: Actively deployed in the ailm system for 24/7 monitoring.

Limitations

  • Overconfidence: May generate incorrect answers rather than stating uncertainty.
  • Translation: Not reliable for Turkish to English translation.
  • Mathematics: Weaker than the base model on TR-GSM8K (52% vs 66%).
  • Domain-Specific Classification: Requires further fine-tuning for highly specialized categories not present in its training data.