PhoneBuddyAI/PhoneBuddy-4B
PhoneBuddyAI/PhoneBuddy-4B is a 4.5 billion parameter Qwen3.5 VL-style checkpoint developed by PhoneBuddyAI, serving as the main Real+Mock reinforcement-learning model. It is specifically designed for research in phone agents, multimodal tool use, and visual action reasoning. This model utilizes a Qwen-style XML tool-call format, enabling complex interactions and structured outputs for agentic applications. Its primary strength lies in facilitating advanced research into AI agents capable of understanding and acting within mobile environments.
Loading preview...
PhoneBuddy-4B: A Specialized Model for Phone Agents
PhoneBuddy-4B is a 4.5 billion parameter model from PhoneBuddyAI, built on a Qwen3.5 VL-style architecture. It represents the core Real+Mock reinforcement-learning checkpoint for the PhoneBuddy project, focusing on advanced agentic capabilities.
Key Capabilities
- Multimodal Tool Use: Designed to integrate and utilize various tools, crucial for complex agent interactions.
- Visual Action Reasoning: Excels in understanding and acting based on visual information, particularly relevant for phone interfaces.
- Qwen-style XML Tool-Call Format: Employs a structured XML format for tool calls, ensuring precise and consistent communication with external functions. This format is defined by the bundled
chat_template.jinja. - Research-Oriented: Primarily intended for research into phone agents, offering a robust platform for developing and testing AI that can interact with mobile environments.
Intended Use Cases
- Phone Agent Development: Ideal for researchers and developers building AI agents that operate within or interact with smartphone ecosystems.
- Multimodal Interaction Research: Suitable for projects exploring how AI can combine visual input with tool use to perform tasks.
- Visual Reasoning Tasks: Applicable to scenarios requiring an AI to interpret visual cues and make decisions based on them.
This model requires a compatible Qwen3.5 VL / PhoneBuddy training or inference environment for full functionality, as it uses specific model_type and processor metadata.