lgy0404/MemGUI-8B-SFT

VISIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kTool Calling:SupportedPublished:Jun 8, 2026License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

MemGUI-8B-SFT by lgy0404 is an 8 billion parameter multimodal mobile GUI agent model, fine-tuned from Qwen3-VL-8B-Instruct on the MemGUI-3K dataset. It is specifically designed for long-horizon mobile GUI control, featuring proactive context management through the ConAct Context-as-Action protocol. This model excels at managing action history, UI state, and recent step records to enable robust mobile automation.

Loading preview...

MemGUI-8B-SFT: A Specialized Mobile GUI Agent

MemGUI-8B-SFT is an 8 billion parameter multimodal agent developed by lgy0404, built upon the Qwen3-VL-8B-Instruct base model. It is specifically fine-tuned using the MemGUI-3K dataset to address the challenges of long-horizon mobile GUI control.

Key Capabilities and Features

  • Proactive Context Management: Implements the ConAct Context-as-Action protocol, allowing the agent to manage three distinct context fields: Folded Action History, Folded UI State, and Recent Step Record.
  • Structured Output: Produces a 5-part structured response at each step, including reasoning, history folding, UI/memory tool calls, grounded UI observations, and action intent.
  • Multimodal Input: Designed to process a system prompt, a user message with a task goal and structured context, and a screenshot image.
  • Performance: Achieves 23.4% Pass@1 and 35.9% Pass@3 on MemGUI-Bench, outperforming the Qwen3-VL-8B-Instruct baseline and setting a new open-data 8B performance record in experiments. It also demonstrates transferability with a 17.9% success rate on MobileWorld GUI-Only.

Intended Use Cases

  • Mobile GUI Agent Research: Ideal for research into mobile GUI agents, long-horizon control, and advanced context management techniques.
  • Action Policy Development: Can serve as an action policy within mobile GUI environments that provide visual input and execute structured tool calls.
  • UI Memory and History Folding: Specifically useful for exploring and developing systems that require intelligent management of UI state and action history over extended interactions.