eth-nlped/TutorRL-7B
TEXT GENERATIONConcurrency Cost:1Model Size:7.6BQuant:FP8Ctx Length:32kPublished:May 27, 2025License:apache-2.0Architecture:Transformer0.0K Open Weights Warm
TutorRL-7B is a 7.6 billion parameter Qwen2.5-7B-Instruct variant developed by eth-nlped, fine-tuned using reinforcement learning (GRPO) for pedagogical alignment. This model functions as a math tutor, optimized to scaffold reasoning and guide students through Socratic questioning rather than directly solving problems. It excels in interactive math tutoring and research on educational LLM alignment, leveraging a 131072 token context length.
Loading preview...