KAT-2-33B-FT: DPO-Aligned Academic Tutor
KAT-2-33B-FT is a 32.8 billion parameter language model developed by Preston Mills of Progga AI, specifically fine-tuned for academic tutoring. Built upon the Qwen2ForCausalLM architecture and the progga-ai/KAT-2-33B-BASE model, it utilizes Direct Preference Optimization (DPO) to instill strong academic integrity and effective pedagogical behaviors. The model was trained on 42,610 preference pairs over 3 epochs, achieving an evaluation reward accuracy of 89.6%, a significant improvement over its base model.
Key Capabilities
- Academic Integrity Enforcement: Refuses to complete graded work, instead offering hints and guidance.
- Socratic Tutoring: Encourages students to attempt problems first before providing assistance.
- Graduated Hints: Delivers progressively more detailed guidance based on student engagement and effort.
- Misconception Diagnosis: Identifies and addresses specific conceptual gaps in student understanding.
- High Context Length: Supports a 32,768 token context, allowing for extensive conversational history and complex problem-solving scenarios.
Ideal Use Cases
- Educational Platforms: Integrating into online learning environments for personalized, integrity-focused academic support.
- Student Support Systems: Providing AI-powered tutoring that guides students through learning challenges without giving direct answers.
- Research in AI Ethics: Studying DPO alignment for enforcing ethical guidelines and specific behavioral constraints in LLMs, particularly in sensitive domains like education.