The MINT-empathy-Qwen3-4B model, developed by Hongli Zhan, is a 4 billion parameter empathic dialogue model fine-tuned from Qwen3-4B. It utilizes a reinforcement learning framework called MINT (Multi-turn Inter-tactic Novelty Training) to diversify discourse moves across conversation turns, preventing repetitive empathy tactics. This Q+D_KL variant, trained on 322 multi-turn emotional support conversations, excels at generating varied and high-quality empathic responses, making it suitable for applications requiring nuanced emotional support.
Loading preview...
MINT-empathy-Qwen3-4B Overview
MINT-empathy-Qwen3-4B is an empathic dialogue model developed by Hongli Zhan, fine-tuned from the Qwen3-4B base model. Its core innovation lies in the MINT (Multi-turn Inter-tactic Novelty Training) reinforcement learning framework, which addresses the common issue of dialogue models repeating the same empathy tactics. This specific release is the Q+D_KL variant, identified as the best-performing configuration from the associated research.
Key Capabilities
- Diversified Empathic Responses: MINT trains the model to vary its discourse moves across conversation turns, moving beyond repetitive empathy tactics.
- Reinforcement Learning: Utilizes GRPO (Group Relative Policy Optimization) via VERL, combining an empathy quality reward with a cross-turn tactic novelty signal.
- High-Quality Empathy: Incorporates a quality reward from the PsychoCounsel-Llama3-8B-Reward model.
- Emotional Support Conversations: Trained on 322 multi-turn emotional support conversations, enhancing its ability to provide nuanced support.
Good for
- Developing chatbots or virtual assistants that require more natural and less repetitive empathic dialogue.
- Applications in mental health support, customer service, or any domain where varied and high-quality emotional understanding is crucial.
- Research into advanced reinforcement learning techniques for dialogue generation and empathy modeling.