Stevenshuqing/gui360-fullparam-sft-step250
Stevenshuqing/gui360-fullparam-sft-step250 is a 7.6 billion parameter model based on Qwen2.5-VL-7B-Instruct, fine-tuned using full-parameter SFT on the GUI-360 balanced 2K dataset. This model is specifically optimized for GUI agent tasks, demonstrating improved task success rates compared to its base model and other PEFT methods. It excels in automating graphical user interface interactions, achieving a 22.2% Task Success Rate and 69.3% Step Success Rate on the GUI-360 test set.
Loading preview...
Model Overview
The gui360-fullparam-sft-step250 model, developed by Stevenshuqing, is a 7.6 billion parameter language model built upon the Qwen2.5-VL-7B-Instruct architecture. It has undergone full-parameter Supervised Fine-Tuning (SFT) using the GUI-360 balanced 2K dataset, which comprises 17,264 steps across various action types including click, type, and swipe. This fine-tuning process, conducted with LLaMA-Factory and ZeRO-3, aims to enhance the model's capabilities in interacting with graphical user interfaces.
Key Capabilities and Performance
This model is specifically designed for GUI agent tasks, demonstrating significant improvements over its base model. On the GUI-360 test 1K balanced dataset, it achieves notable performance metrics:
- Task Success Rate (TSR): 22.2%
- Step Success Rate (StepSR): 69.3%
- Progress: 35.3%
These results position it as the top performer among the evaluated methods, surpassing even Cooperative RL and various PEFT approaches in the GUI-360 benchmark. For instance, the base model (zero-shot) only achieved a 2.4% TSR, highlighting the effectiveness of the full-parameter SFT.
Use Cases
This model is particularly well-suited for applications requiring automated interaction with graphical user interfaces. Potential use cases include:
- Automated UI testing: Simulating user interactions to test software applications.
- Robotic Process Automation (RPA): Automating repetitive tasks within GUI environments.
- Intelligent agents: Developing agents that can navigate and operate software interfaces autonomously.