Name: Joinn/UserMirrorrer-Llama-DPO API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: Joinn

UserMirrorrer-Llama-DPO: Preference-Aligned User Simulator

This model, developed by Joinn, is a 3.2 billion parameter user simulator fine-tuned from Llama-3.2-3B-Instruct. Its core purpose is to accurately simulate user behavior and preferences within recommendation systems. The model was introduced in the paper "Mirroring Users: Towards Building Preference-aligned User Simulator with User Feedback in Recommendation" and utilizes the UserMirrorer framework.

Key Capabilities

Preference Alignment: Designed to align with human preferences by generating explanatory rationales for simulated decision-making processes.
User Behavior Simulation: Simulates user actions and preferences based on extensive user feedback.
Recommender System Integration: Intended for use in developing and testing recommender systems.

Training Methodology

The fine-tuning process involved two distinct stages:

Supervised Fine-tuning (SFT): Conducted for 1 epoch.
Direct Preference Optimization (DPO): Applied for 2 epochs to further refine preference alignment.

Good For

Researchers and developers working on user simulation in recommendation systems.
Evaluating and improving the alignment of recommender systems with user preferences.
Exploring user decision-making processes through generated rationales.

For more technical details, refer to the original paper and the GitHub repository.

Overview

UserMirrorrer-Llama-DPO: Preference-Aligned User Simulator

Key Capabilities

Training Methodology

Good For

Full Model Card (README)