HumanLM-Opinion: Simulating User States
HumanLM-Opinion, developed by snap-stanford, is an 8 billion parameter user simulator built on the Qwen3-8B base model. Unlike traditional fine-tuning that focuses on response imitation, HumanLM is trained using Group Relative Policy Optimization (GRPO) with a unique state alignment method. It explicitly models and generates responses based on six psychologically-grounded user state dimensions: belief, goal, value, stance, emotion, and communication style.
This specific checkpoint is fine-tuned on the Humanual-Opinion benchmark, which comprises Reddit users' opinionated responses in personal-issue discussion threads. The model's generation process includes a <think> block where it reasons about these latent states before producing the final response.
Key Capabilities
- State-Aligned Response Generation: Generates responses that reflect a user's underlying beliefs, emotions, values, and communication style.
- Opinionated User Simulation: Excels at producing diverse, opinionated feedback, particularly in discussion-based contexts.
- Contextual Reasoning: Utilizes a
<think> block to reason about latent user states, enhancing the realism and depth of simulated responses. - High Naturalness: Achieved a 76.6% rating of "quite natural" or "indistinguishable from human" in real-time user studies.
Good for
- User Research: Understanding how different user personas might react to content or scenarios.
- Content Testing: Predicting audience reactions to posts, articles, or policy drafts.
- AI Alignment: Generating varied and realistic user feedback to train and evaluate collaborative AI systems.
- Social Simulation: Modeling opinion dynamics and interactions within online communities.