Name: ByteDance-Seed/UI-TARS-1.5-7B API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: ByteDance-Seed

UI-TARS-1.5-7B Model Overview

UI-TARS-1.5-7B, developed by ByteDance-Seed, is a 7 billion parameter open-source multimodal agent leveraging a powerful vision-language model. It is designed to perform diverse tasks within virtual worlds, particularly focusing on general computer use and GUI interactions. The model integrates advanced reasoning capabilities, enhanced by reinforcement learning, allowing it to process thoughts before taking action, which improves performance and adaptability.

Key Capabilities and Performance

Multimodal Agent: Built on a vision-language model, enabling interaction with visual interfaces.
Enhanced Reasoning: Utilizes reinforcement learning for advanced reasoning, improving task execution and adaptability.
Strong Computer Use: Achieves competitive results on benchmarks like OSworld (27.5) and ScreenSpotPro (49.6) for GUI grounding.
General Purpose: While part of the UI-TARS 1.5 family, this 7B variant is primarily optimized for general computer use rather than game-specific scenarios.

Use Cases

Automated GUI Interaction: Ideal for tasks requiring interaction with graphical user interfaces.
Virtual Environment Task Execution: Capable of performing diverse tasks within various virtual worlds.
Research and Development: Provides a foundation for exploring multimodal agent capabilities and advanced reasoning in AI.

Overview

UI-TARS-1.5-7B Model Overview

Key Capabilities and Performance

Use Cases

Full Model Card (README)