McGill-NLP/Llama-3-8B-Web

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:8kPublished:Apr 22, 2024License:llama3Architecture:Transformer0.2K Cold

McGill-NLP/Llama-3-8B-Web is an 8 billion parameter Llama 3-based instruction-tuned model developed by McGill-NLP, specifically fine-tuned on the WebLINX dataset for web navigation tasks. This model is designed to power agents for browsing the web, demonstrating superior performance over GPT-4V (zero-shot) on the WebLINX benchmark. It excels at identifying useful links, relevant elements, and formulating aligned responses for human-centric web browsing.

Loading preview...

Overview

McGill-NLP/Llama-3-8B-Web is an 8 billion parameter model developed by McGill-NLP, fine-tuned from Meta-Llama-3-8B-Instruct. Its primary purpose is to enable powerful agents for web browsing, particularly for human-centric interactions. The model was trained on a 24K curated subset of the WebLINX dataset, which comprises over 100K instances of web navigation and dialogue.

Key Capabilities & Performance

  • Web Navigation Excellence: Surpasses GPT-4V (zero-shot) by over 18% on the WebLINX benchmark, achieving an overall score of 28.8% on out-of-domain test splits.
  • Improved Action Prediction: Demonstrates significantly better performance in choosing useful links (34.1% vs 18.9% seg-F1), clicking relevant elements (27.1% vs 13.6% IoU), and formulating aligned responses (37.5% vs 3.1% chr-F1) compared to GPT-4V.
  • Robust Evaluation: Evaluated against the WebLINX benchmark, which includes 4 real-world splits testing generalization across new websites, domains, geographic locations, and dialogue-reliant scenarios.
  • Continuous Data Curation: The project aims to continuously curate and release datasets, including future integration of Mind2Web's training data, to enhance agent generalization.

Use Cases

  • Building Web Agents: Ideal for developing agents that can browse the web on behalf of users, performing tasks like booking, shopping, writing, and knowledge lookup.
  • Automated Web Interaction: Suitable for scenarios requiring automated interaction with web elements, such as clicking, text input, and form submission.
  • Dialogue-Guided Browsing: Particularly effective in situations where the agent needs to navigate the web based on multi-turn dialogue and user instructions.