Name: eth-nlped/TutorRL-7B-think API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: eth-nlped

TutorRL-7B-think: A Pedagogical Math Tutor

TutorRL-7B-think is a 7.6 billion parameter model, fine-tuned from Qwen/Qwen2.5-7B-Instruct, specifically designed to function as a math tutor rather than a direct problem-solver. Developed by eth-nlped, this model leverages reinforcement learning (GRPO) within a synthetic multi-turn classroom environment to align with pedagogical principles, notably without requiring human-labeled data.

Key Capabilities

Pedagogical Alignment: Optimized to scaffold reasoning and guide students through Socratic questioning.
Solution Withholding: Designed to withhold final solutions when beneficial for the learning process.
Annotation-Free Training: Utilizes a scalable, annotation-free approach for training LLMs as educational tutors, as detailed in the research project From Problem-Solving to Teaching Problem-Solving.
Hidden Thinking: This specific variant includes a "thinking" capability, with internal thought processes enclosed in <think> ... </think> tags.

Good For

Interactive math tutoring applications.
Generating Socratic dialogues for educational purposes.
Research into the educational alignment of large language models.
Creating safe and indirect teaching methodologies in problem-solving contexts.

Overview

TutorRL-7B-think: A Pedagogical Math Tutor

Key Capabilities

Good For

Full Model Card (README)