cmu-lti/osim-8b
The cmu-lti/osim-8b model is an 8 billion parameter post-trained text checkpoint of OSim (OdysSim), a foundation model for human behavior simulation built upon Qwen3-8B. It is specifically designed to imitate the human/user side of interactions rather than acting as a helpful assistant. This model excels at user simulation for agent evaluation, social simulation, and persona/role-play, achieving a USI score of 75.6 in out-of-distribution evaluations.
Loading preview...
OSim-8B: A Foundation Model for Human Behavior Simulation
OSim-8B is the post-trained text checkpoint of OdysSim, an 8 billion parameter foundation model developed by cmu-lti for simulating human behavior. Unlike traditional LLMs that aim to be helpful assistants, OSim-8B is specifically trained to imitate the human/user side of interactions. It is built on the Qwen3-8B base model, midtrained on the extensive OdysSim corpus (62 behavioral datasets across five "Soul" axes: CONV/SS/COG/ROLE/EVAL), and then further refined through task-specific reinforcement learning and expert consolidation.
Key Capabilities
- User Simulation: Designed to simulate the human/user side of conversations, making it ideal for evaluating agents, social simulations, and persona/role-play scenarios.
- Contextual Response Generation: Generates human-like turns based on a "social-context" system prompt (defining role, goal, background, style) and the other party's conversational turns.
- High Behavioral Accuracy: Achieves a USI 75.6 in out-of-distribution evaluations using the τ-USI agentic benchmark, outperforming other specialized and general instruct models of similar size.
- Human-like Reactivity: Demonstrates Sørensen–Dice D4 ≈ 93, matching human inter-annotator levels, and exhibits excellent outcome calibration (best ECE among compared models).
Good For
- Agent Evaluation: Simulating realistic user interactions to test and refine AI agents.
- Social Simulation: Modeling human responses and behaviors in various social contexts.
- Persona/Role-play: Generating authentic human-like dialogue for specific roles or personas.