Overview
McGill-NLP/Llama-3-8B-Web is an 8 billion parameter model developed by McGill-NLP, fine-tuned from Meta-Llama-3-8B-Instruct. Its primary purpose is to enable powerful agents for web browsing, particularly for human-centric interactions. The model was trained on a 24K curated subset of the WebLINX dataset, which comprises over 100K instances of web navigation and dialogue.
Key Capabilities & Performance
- Web Navigation Excellence: Surpasses GPT-4V (zero-shot) by over 18% on the WebLINX benchmark, achieving an overall score of 28.8% on out-of-domain test splits.
- Improved Action Prediction: Demonstrates significantly better performance in choosing useful links (34.1% vs 18.9% seg-F1), clicking relevant elements (27.1% vs 13.6% IoU), and formulating aligned responses (37.5% vs 3.1% chr-F1) compared to GPT-4V.
- Robust Evaluation: Evaluated against the WebLINX benchmark, which includes 4 real-world splits testing generalization across new websites, domains, geographic locations, and dialogue-reliant scenarios.
- Continuous Data Curation: The project aims to continuously curate and release datasets, including future integration of Mind2Web's training data, to enhance agent generalization.
Use Cases
- Building Web Agents: Ideal for developing agents that can browse the web on behalf of users, performing tasks like booking, shopping, writing, and knowledge lookup.
- Automated Web Interaction: Suitable for scenarios requiring automated interaction with web elements, such as clicking, text input, and form submission.
- Dialogue-Guided Browsing: Particularly effective in situations where the agent needs to navigate the web based on multi-turn dialogue and user instructions.