snap-stanford/humanlm-opinion
HumanLM-Opinion is an 8 billion parameter user simulator developed by snap-stanford, built upon the Qwen3-8B base model and trained with GRPO on the Humanual-Opinion benchmark. This model specializes in generating opinionated responses that capture underlying user states across cognitive, normative, affective, and linguistic dimensions, rather than merely imitating surface-level language. Its primary use case is simulating diverse user feedback for research, content testing, and AI alignment, offering a 32768-token context length.
Loading preview...
HumanLM-Opinion: Simulating User States
HumanLM-Opinion, developed by snap-stanford, is an 8 billion parameter user simulator built on the Qwen3-8B base model. Unlike traditional fine-tuning that focuses on response imitation, HumanLM is trained using Group Relative Policy Optimization (GRPO) with a unique state alignment method. It explicitly models and generates responses based on six psychologically-grounded user state dimensions: belief, goal, value, stance, emotion, and communication style.
This specific checkpoint is fine-tuned on the Humanual-Opinion benchmark, which comprises Reddit users' opinionated responses in personal-issue discussion threads. The model's generation process includes a <think> block where it reasons about these latent states before producing the final response.
Key Capabilities
- State-Aligned Response Generation: Generates responses that reflect a user's underlying beliefs, emotions, values, and communication style.
- Opinionated User Simulation: Excels at producing diverse, opinionated feedback, particularly in discussion-based contexts.
- Contextual Reasoning: Utilizes a
<think>block to reason about latent user states, enhancing the realism and depth of simulated responses. - High Naturalness: Achieved a 76.6% rating of "quite natural" or "indistinguishable from human" in real-time user studies.
Good for
- User Research: Understanding how different user personas might react to content or scenarios.
- Content Testing: Predicting audience reactions to posts, articles, or policy drafts.
- AI Alignment: Generating varied and realistic user feedback to train and evaluate collaborative AI systems.
- Social Simulation: Modeling opinion dynamics and interactions within online communities.