eth-nlped/TutorRL-7B-think

TEXT GENERATIONConcurrency Cost:1Model Size:7.6BQuant:FP8Ctx Length:32kPublished:May 27, 2025License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

eth-nlped/TutorRL-7B-think is a 7.6 billion parameter fine-tuned variant of Qwen/Qwen2.5-7B-Instruct, developed by eth-nlped. This model is uniquely aligned to act as a math tutor, focusing on pedagogical principles through reinforcement learning (GRPO) in synthetic classroom settings. It excels at scaffolding reasoning, guiding with Socratic questioning, and withholding direct solutions to enhance learning. Its primary use is for interactive math tutoring and research into educational LLM alignment.

Loading preview...

TutorRL-7B-think: A Pedagogical Math Tutor

TutorRL-7B-think is a 7.6 billion parameter model, fine-tuned from Qwen/Qwen2.5-7B-Instruct, specifically designed to function as a math tutor rather than a direct problem-solver. Developed by eth-nlped, this model leverages reinforcement learning (GRPO) within a synthetic multi-turn classroom environment to align with pedagogical principles, notably without requiring human-labeled data.

Key Capabilities

  • Pedagogical Alignment: Optimized to scaffold reasoning and guide students through Socratic questioning.
  • Solution Withholding: Designed to withhold final solutions when beneficial for the learning process.
  • Annotation-Free Training: Utilizes a scalable, annotation-free approach for training LLMs as educational tutors, as detailed in the research project From Problem-Solving to Teaching Problem-Solving.
  • Hidden Thinking: This specific variant includes a "thinking" capability, with internal thought processes enclosed in <think> ... </think> tags.

Good For

  • Interactive math tutoring applications.
  • Generating Socratic dialogues for educational purposes.
  • Research into the educational alignment of large language models.
  • Creating safe and indirect teaching methodologies in problem-solving contexts.