GD-ML/Code2World
Code2World-8B by GD-ML is an 8 billion parameter model designed for GUI world modeling. It specializes in predicting the next GUI screenshot by generating renderable code (HTML) given a current GUI observation and an action. This model enables dynamic simulation of user interactions within graphical interfaces, offering a unique approach to understanding and forecasting GUI state changes. Its primary application is in scenarios requiring GUI state prediction and interaction simulation.
Loading preview...
Overview
GD-ML/Code2World is an 8 billion parameter model focused on GUI world modeling through renderable code generation. Unlike traditional language models, Code2World takes a current GUI observation (screenshot) and a user action as input, then predicts the subsequent GUI state by generating the corresponding HTML code. This allows for dynamic simulation and understanding of how user interactions modify graphical interfaces.
Key Capabilities
- GUI State Prediction: Generates the next GUI screenshot based on an input image and a specified action.
- Renderable Code Generation: Outputs HTML code that can be rendered to visualize the predicted GUI state.
- Action Integration: Incorporates user actions (e.g., click, swipe) into its prediction mechanism.
- Hugging Face Transformers Compatibility: Designed to be used seamlessly with the
transformerslibrary, requiring version4.57.0.
How it Works
The model utilizes a Qwen3VLForConditionalGeneration architecture. It processes a system prompt, an image, and a user prompt detailing the instruction and action. The output is then post-processed to extract clean HTML, which can be rendered into an image to show the predicted GUI. Helper functions are provided for building prompts, adding visual hints to input images, and rendering/saving outputs.
Use Cases
Code2World is particularly suited for applications involving:
- Automated GUI testing and validation.
- Interactive agent development for graphical interfaces.
- Prototyping and simulating user experiences.
- Research into GUI understanding and interaction modeling.