Joinn/UserMirrorrer-Qwen-DPO
Joinn/UserMirrorrer-Qwen-DPO is a 3.1 billion parameter Qwen-2.5-3B-Instruct based model developed by Joinn, specifically fine-tuned as a user simulator for recommender systems. It leverages extensive user feedback and Direct Preference Optimization (DPO) to achieve preference alignment and uses decision-making processes as explanatory rationales. This model excels at simulating user behavior to reduce ambiguity in recommendation simulation samples.
Loading preview...
Overview
Joinn/UserMirrorrer-Qwen-DPO is a 3.1 billion parameter model built upon the Qwen-2.5-3B-Instruct base, developed by Joinn. Its primary purpose is to function as a preference-aligned user simulator within recommender systems (RSs). The model was introduced in the paper "Mirroring Users: Towards Building Preference-aligned User Simulator with User Feedback in Recommendation" and is designed to simulate user behavior by incorporating user feedback.
Key Capabilities and Features
- User Simulation: Specifically engineered to mimic user behavior in recommender systems.
- Preference Alignment: Achieves better alignment with user preferences through extensive feedback integration.
- Explanatory Rationales: Utilizes decision-making processes as explanatory rationales to clarify simulation samples.
- Fine-tuning: Underwent a two-stage fine-tuning process:
- Supervised Finetuning (SFT) for 1 epoch.
- Direct Preference Optimization (DPO) for 2 epochs.
- Dataset: Trained on the UserMirrorer dataset.
Use Cases
This model is particularly well-suited for:
- Recommender System Research: Developing and testing new recommender algorithms by simulating realistic user interactions.
- User Behavior Modeling: Gaining insights into how users might interact with recommended items.
- Reducing Ambiguity: Generating simulation samples with clearer, more interpretable decision-making processes.
For more technical details, refer to the associated research paper.