Joinn/UserMirrorrer-Llama-DPO

TEXT GENERATIONConcurrency Cost:1Model Size:3.2BQuant:BF16Ctx Length:32kPublished:May 11, 2025Architecture:Transformer Cold

Joinn/UserMirrorrer-Llama-DPO is a 3.2 billion parameter preference-aligned user simulator, fine-tuned from Llama-3.2-3B-Instruct with a 32768 token context length. Developed by Joinn, this model is specifically designed to simulate user behavior and preferences within recommendation systems. It leverages extensive user feedback to generate decision-making processes as explanatory rationales, enhancing alignment with human preferences. This model is primarily intended for research and development in recommender systems to build and test user simulators.

Loading preview...

UserMirrorrer-Llama-DPO: Preference-Aligned User Simulator

This model, developed by Joinn, is a 3.2 billion parameter user simulator fine-tuned from Llama-3.2-3B-Instruct. Its core purpose is to accurately simulate user behavior and preferences within recommendation systems. The model was introduced in the paper "Mirroring Users: Towards Building Preference-aligned User Simulator with User Feedback in Recommendation" and utilizes the UserMirrorer framework.

Key Capabilities

  • Preference Alignment: Designed to align with human preferences by generating explanatory rationales for simulated decision-making processes.
  • User Behavior Simulation: Simulates user actions and preferences based on extensive user feedback.
  • Recommender System Integration: Intended for use in developing and testing recommender systems.

Training Methodology

The fine-tuning process involved two distinct stages:

  • Supervised Fine-tuning (SFT): Conducted for 1 epoch.
  • Direct Preference Optimization (DPO): Applied for 2 epochs to further refine preference alignment.

Good For

  • Researchers and developers working on user simulation in recommendation systems.
  • Evaluating and improving the alignment of recommender systems with user preferences.
  • Exploring user decision-making processes through generated rationales.

For more technical details, refer to the original paper and the GitHub repository.