lgy0404/MemGUI-8B-SFT
VISIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kTool Calling:SupportedPublished:Jun 8, 2026License:apache-2.0Architecture:Transformer0.0K Open Weights Cold
MemGUI-8B-SFT by lgy0404 is an 8 billion parameter multimodal mobile GUI agent model, fine-tuned from Qwen3-VL-8B-Instruct on the MemGUI-3K dataset. It is specifically designed for long-horizon mobile GUI control, featuring proactive context management through the ConAct Context-as-Action protocol. This model excels at managing action history, UI state, and recent step records to enable robust mobile automation.
Loading preview...
MemGUI-8B-SFT: A Specialized Mobile GUI Agent
MemGUI-8B-SFT is an 8 billion parameter multimodal agent developed by lgy0404, built upon the Qwen3-VL-8B-Instruct base model. It is specifically fine-tuned using the MemGUI-3K dataset to address the challenges of long-horizon mobile GUI control.
Key Capabilities and Features
- Proactive Context Management: Implements the ConAct Context-as-Action protocol, allowing the agent to manage three distinct context fields: Folded Action History, Folded UI State, and Recent Step Record.
- Structured Output: Produces a 5-part structured response at each step, including reasoning, history folding, UI/memory tool calls, grounded UI observations, and action intent.
- Multimodal Input: Designed to process a system prompt, a user message with a task goal and structured context, and a screenshot image.
- Performance: Achieves 23.4% Pass@1 and 35.9% Pass@3 on MemGUI-Bench, outperforming the Qwen3-VL-8B-Instruct baseline and setting a new open-data 8B performance record in experiments. It also demonstrates transferability with a 17.9% success rate on MobileWorld GUI-Only.
Intended Use Cases
- Mobile GUI Agent Research: Ideal for research into mobile GUI agents, long-horizon control, and advanced context management techniques.
- Action Policy Development: Can serve as an action policy within mobile GUI environments that provide visual input and execute structured tool calls.
- UI Memory and History Folding: Specifically useful for exploring and developing systems that require intelligent management of UI state and action history over extended interactions.