Joinn/UserMirrorrer-Qwen-DPO

TEXT GENERATIONConcurrency Cost:1Model Size:3.1BQuant:BF16Ctx Length:32kPublished:May 13, 2025License:cc-by-nc-4.0Architecture:Transformer Open Weights Cold

Joinn/UserMirrorrer-Qwen-DPO is a 3.1 billion parameter Qwen-2.5-3B-Instruct based model developed by Joinn, specifically fine-tuned as a user simulator for recommender systems. It leverages extensive user feedback and Direct Preference Optimization (DPO) to achieve preference alignment and uses decision-making processes as explanatory rationales. This model excels at simulating user behavior to reduce ambiguity in recommendation simulation samples.

Loading preview...

Overview

Joinn/UserMirrorrer-Qwen-DPO is a 3.1 billion parameter model built upon the Qwen-2.5-3B-Instruct base, developed by Joinn. Its primary purpose is to function as a preference-aligned user simulator within recommender systems (RSs). The model was introduced in the paper "Mirroring Users: Towards Building Preference-aligned User Simulator with User Feedback in Recommendation" and is designed to simulate user behavior by incorporating user feedback.

Key Capabilities and Features

  • User Simulation: Specifically engineered to mimic user behavior in recommender systems.
  • Preference Alignment: Achieves better alignment with user preferences through extensive feedback integration.
  • Explanatory Rationales: Utilizes decision-making processes as explanatory rationales to clarify simulation samples.
  • Fine-tuning: Underwent a two-stage fine-tuning process:
    • Supervised Finetuning (SFT) for 1 epoch.
    • Direct Preference Optimization (DPO) for 2 epochs.
  • Dataset: Trained on the UserMirrorer dataset.

Use Cases

This model is particularly well-suited for:

  • Recommender System Research: Developing and testing new recommender algorithms by simulating realistic user interactions.
  • User Behavior Modeling: Gaining insights into how users might interact with recommended items.
  • Reducing Ambiguity: Generating simulation samples with clearer, more interpretable decision-making processes.

For more technical details, refer to the associated research paper.