UserLM-8b: A Novel User Simulator
UserLM-8b, developed by Microsoft Research, is an 8 billion parameter language model with an 8192 token context length, uniquely trained to simulate the user role in conversations. This contrasts with most LLMs designed as assistants. Its primary purpose is to provide a more realistic and robust environment for evaluating assistant LLMs.
Key Capabilities
- User Role Simulation: Generates initial and follow-up user utterances based on a "task intent" and conversation state.
- Conversation Termination: Predicts when a user would end a conversation using a
<|endconversation|> token. - Enhanced Evaluation: Offers a more diverse and realistic simulation of user behavior compared to prompting assistant models, leading to better estimates of assistant LLM performance.
- Robustness: Demonstrates improved adherence to the user role and task intent, though occasional deviations and hallucinations can occur.
Training and Performance
UserLM-8b was fine-tuned from Llama3-8b-Base on a filtered version of the WildChat-1M dataset. It was trained for 227 hours on four NVIDIA RTX A6000 GPUs. Evaluations show UserLM-8b achieves lower perplexity (higher distributional alignment) and outperforms assistant-based methods across six intrinsic metrics for user simulators. Extrinsic evaluations confirm it leads to more diverse and realistic conversation simulations.
Intended Use Cases
- Research: Primarily for researchers evaluating assistant LLMs.
- Downstream Applications: Potential for user modeling, foundation for judge models, and synthetic data generation (in conjunction with an assistant LM).
Note: UserLM-8b is not an assistant LM and is not suitable for end-users seeking assistance with tasks. It is a research release and requires further testing for commercial or real-world applications.