McGill-NLP/A3-Qwen3.5-9B
McGill-NLP/A3-Qwen3.5-9B is a 9 billion parameter multimodal web agent model, fine-tuned from Qwen/Qwen3.5-9B by McGill-NLP. It is specifically designed for web-based tasks, leveraging the A3-Synth synthetic dataset. This model demonstrates strong performance in web environments, achieving 41.5% on WebArena, outperforming larger closed-source models like Claude 3.5 Sonnet and GPT-4o.
Loading preview...
Model Overview
A3-Qwen3.5-9B is a 9 billion parameter multimodal web agent developed by McGill-NLP. It is fine-tuned from the Qwen/Qwen3.5-9B base model using the A3-Synth dataset, which consists of 16,000 examples derived from Gemini 3 Pro trajectories. This model is engineered to excel in complex web-based interactions and tasks.
Key Capabilities
- Multimodal Web Agency: Designed to act as an agent within web environments, capable of understanding and interacting with web content.
- High WebArena Performance: Achieves a notable 41.5% score on the WebArena benchmark, surpassing closed-source models such as Claude 3.5 Sonnet (36.0%) and GPT-4o (31.5%) under the same evaluation protocol.
- Structured Distillation: Utilizes the Agent-as-Annotators (A3) framework for structured distillation of web agent capabilities, enabling better generalization.
Training Details
The model was trained using Supervised Fine-Tuning (SFT) with FSDP, a maximum sequence length of 16,384, a learning rate of 1e-5, and trained for 2 epochs. The training utilized a batch size of 1 per GPU with a gradient accumulation of 4.
Use Cases
This model is particularly well-suited for applications requiring autonomous web interaction, web automation, and complex task execution within web browsers. Its strong performance on WebArena suggests its utility in scenarios demanding robust web agent capabilities.