trillionlabs/gWorld-8B

VISIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Jan 28, 2026License:apache-2.0Architecture:Transformer0.1K Open Weights Cold

trillionlabs/gWorld-8B is an 8 billion parameter Vision-Language Model (VLM) developed by Trillion Labs, based on the Qwen3-VL-8B architecture. It is the first open-weight VLM specialized for visual mobile GUI world modeling, predicting the next GUI state as executable web code. This model excels at action-conditioned next-state prediction for mobile interfaces, generating pixel-perfect text rendering and structurally accurate layouts in HTML/CSS. It achieves a new Pareto frontier in GUI world modeling accuracy, outperforming models up to 50.25x larger on GUI-specific benchmarks.

Loading preview...

gWorld-8B: Mobile GUI World Model

gWorld-8B (GuiWorld) is an 8 billion parameter Vision-Language Model (VLM) from Trillion Labs, built upon the Qwen3-VL-8B architecture. It is uniquely designed for visual mobile GUI world modeling, distinguishing itself by predicting the next GUI state as executable web code rather than pixels. This approach ensures high fidelity in text rendering and structural layout, mitigating common issues like hallucination and legibility found in pixel-generation models. The model was presented in the paper "Generative Visual Code Mobile World Models" and accepted to ICML 2026.

Key Capabilities

  • Action-Conditioned Next-State Prediction: Given a mobile screenshot and a user action (e.g., TAP, TYPE), gWorld-8B generates the logical next state of the GUI.
  • Executable Web Code Output: It produces renderable HTML/CSS code, ensuring pixel-perfect text and accurate layouts. This results in a render failure rate of less than 1% and fast rendering times (~0.3s via Playwright).
  • New Pareto Frontier: Establishes a new efficiency and accuracy benchmark in GUI world modeling, outperforming significantly larger models (up to 50.25x larger) on GUI-specific benchmarks.
  • High Accuracy: Achieves a +45.7% gain in Instruction Accuracy (IAcc.) over its base Qwen3-VL model.
  • Zero-Shot Generalization: Demonstrates strong performance on out-of-distribution benchmarks like AndroidWorld and KApps.
  • Reasoning Generation: Before outputting code, the model generates a "Next State Reasoning" block to explain the visual transition based on the action.

Use Cases

gWorld-8B is ideal for applications requiring precise and dynamic simulation of mobile user interfaces, such as:

  • Automated mobile UI testing and validation.
  • Interactive mobile app prototyping and design tools.
  • Developing intelligent agents for mobile device control and interaction.
  • Research in generative UI design and human-computer interaction.