GD-ML/Code2World

VISIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Feb 24, 2026License:mitArchitecture:Transformer0.0K Open Weights Cold

Code2World-8B by GD-ML is an 8 billion parameter model designed for GUI world modeling. It specializes in predicting the next GUI screenshot by generating renderable code (HTML) given a current GUI observation and an action. This model enables dynamic simulation of user interactions within graphical interfaces, offering a unique approach to understanding and forecasting GUI state changes. Its primary application is in scenarios requiring GUI state prediction and interaction simulation.

Loading preview...

Overview

GD-ML/Code2World is an 8 billion parameter model focused on GUI world modeling through renderable code generation. Unlike traditional language models, Code2World takes a current GUI observation (screenshot) and a user action as input, then predicts the subsequent GUI state by generating the corresponding HTML code. This allows for dynamic simulation and understanding of how user interactions modify graphical interfaces.

Key Capabilities

  • GUI State Prediction: Generates the next GUI screenshot based on an input image and a specified action.
  • Renderable Code Generation: Outputs HTML code that can be rendered to visualize the predicted GUI state.
  • Action Integration: Incorporates user actions (e.g., click, swipe) into its prediction mechanism.
  • Hugging Face Transformers Compatibility: Designed to be used seamlessly with the transformers library, requiring version 4.57.0.

How it Works

The model utilizes a Qwen3VLForConditionalGeneration architecture. It processes a system prompt, an image, and a user prompt detailing the instruction and action. The output is then post-processed to extract clean HTML, which can be rendered into an image to show the predicted GUI. Helper functions are provided for building prompts, adding visual hints to input images, and rendering/saving outputs.

Use Cases

Code2World is particularly suited for applications involving:

  • Automated GUI testing and validation.
  • Interactive agent development for graphical interfaces.
  • Prototyping and simulating user experiences.
  • Research into GUI understanding and interaction modeling.