MINT-empathy-Qwen3-4B: Enhanced Empathic Dialogue

This model, developed by Hongli Zhan, is a 4 billion parameter variant of the Qwen3-4B base model, fine-tuned using the MINT (Multi-turn Inter-tactic Novelty Training) reinforcement learning framework. MINT optimizes for both empathic response quality and cross-turn discourse-move novelty, aiming to reduce repetitive conversational tactics.

Key Capabilities & Differentiators

Optimized for Empathy: Achieves significant improvements in aggregate empathy scores, increasing from 3.75 to 4.67 on the Lend-an-Ear test set compared to the vanilla Qwen3-4B baseline.
Reduced Tactic Repetition: Effectively decreases "tactic stickiness" from 0.57 to 0.42, promoting more diverse and natural conversational flow across turns.
Reinforcement Learning Framework: Utilizes GRPO via VERL, with a reward function balancing empathy quality and cross-turn tactic diversity.
Best Overall MINT Checkpoint: Identified as the strongest MINT model for its joint tradeoff between empathy quality and reduced tactic repetition.

Intended Use Cases

Research on Empathic Dialogue: Ideal for academic and research purposes focused on understanding and generating empathic responses.
Discourse Diversity Studies: Useful for exploring and improving the variety of conversational tactics in AI-generated dialogue.
Supportive Response Generation: Applicable in research contexts for developing AI systems that provide more supportive and nuanced conversational interactions.

It is important to note that this model is a research artifact, evaluated on fixed conversation contexts, and is not intended for use as a therapy system.

Overview

MINT-empathy-Qwen3-4B: Enhanced Empathic Dialogue

Key Capabilities & Differentiators

Intended Use Cases

Full Model Card (README)