cmu-lti/osim-8b

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kTool Calling:SupportedPublished:Jun 4, 2026License:mitArchitecture:Transformer0.0K Open Weights Cold

The cmu-lti/osim-8b model is an 8 billion parameter post-trained text checkpoint of OSim (OdysSim), a foundation model for human behavior simulation built upon Qwen3-8B. It is specifically designed to imitate the human/user side of interactions rather than acting as a helpful assistant. This model excels at user simulation for agent evaluation, social simulation, and persona/role-play, achieving a USI score of 75.6 in out-of-distribution evaluations.

Loading preview...

OSim-8B: A Foundation Model for Human Behavior Simulation

OSim-8B is the post-trained text checkpoint of OdysSim, an 8 billion parameter foundation model developed by cmu-lti for simulating human behavior. Unlike traditional LLMs that aim to be helpful assistants, OSim-8B is specifically trained to imitate the human/user side of interactions. It is built on the Qwen3-8B base model, midtrained on the extensive OdysSim corpus (62 behavioral datasets across five "Soul" axes: CONV/SS/COG/ROLE/EVAL), and then further refined through task-specific reinforcement learning and expert consolidation.

Key Capabilities

  • User Simulation: Designed to simulate the human/user side of conversations, making it ideal for evaluating agents, social simulations, and persona/role-play scenarios.
  • Contextual Response Generation: Generates human-like turns based on a "social-context" system prompt (defining role, goal, background, style) and the other party's conversational turns.
  • High Behavioral Accuracy: Achieves a USI 75.6 in out-of-distribution evaluations using the τ-USI agentic benchmark, outperforming other specialized and general instruct models of similar size.
  • Human-like Reactivity: Demonstrates Sørensen–Dice D4 ≈ 93, matching human inter-annotator levels, and exhibits excellent outcome calibration (best ECE among compared models).

Good For

  • Agent Evaluation: Simulating realistic user interactions to test and refine AI agents.
  • Social Simulation: Modeling human responses and behaviors in various social contexts.
  • Persona/Role-play: Generating authentic human-like dialogue for specific roles or personas.