stanfordnlp/llama8b-nnetnav-live

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Jan 28, 2025License:apache-2.0Architecture:Transformer Open Weights Cold

The stanfordnlp/llama8b-nnetnav-live model is an 8 billion parameter Llama-3.1-8B based instruction-tuned web agent developed by Stanford NLP. It is specifically designed for web navigation and task execution on live websites, trained using unsupervised exploration data from 15 diverse web platforms. This model excels at performing complex web actions like clicking, typing, and navigating based on natural language instructions, making it suitable for automating browser-based tasks.

Loading preview...

Model Overview

The stanfordnlp/llama8b-nnetnav-live is an 8 billion parameter web agent model, built upon the Llama-3.1-8B architecture and instruction-tuned by Stanford NLP. Its core capability lies in executing web actions based on natural language instructions, effectively acting as a browser agent. The model was trained using the NNetNav-Live dataset, which comprises approximately 5,000 synthetic demonstrations collected through unsupervised exploration on 15 live websites, including major platforms like Amazon, Google, and GitHub.

Key Capabilities

  • Web Task Automation: Designed to perform a sequence of actions on live websites given a high-level instruction (e.g., "Upvote the post by user smurty123 on subreddit r/LocalLLaMA").
  • Comprehensive Action Space: Supports a wide range of browser operations including click, type, hover, press key combinations, scroll, new_tab, tab_focus, close_tab, goto URL, go_back, go_forward, and stop with an answer.
  • Performance: Achieves a 9.5% success rate on WebArena and 35.2% on WebVoyager benchmarks, demonstrating competitive performance in web navigation tasks.

Use Cases and Limitations

This model is particularly well-suited for automating repetitive or complex web interactions. However, users should be aware of its limitations:

  • Bias: Inherits biases from its training data, potentially struggling with out-of-domain websites (e.g., government sites) or non-English/culturally distinct platforms.
  • Risks: Prone to unintended actions on ambiguous websites, security/privacy risks (e.g., credential exposure), and adversarial manipulation by malicious websites using dark patterns.
  • Generalization: May struggle with websites significantly different from the 15 sites it was trained on. For distinct domains, training a custom NNetNav model is recommended.
  • Instruction Sensitivity: Vague instructions can lead to unintended or suboptimal actions.
  • Long-Horizon Tasks: Performance may degrade on tasks requiring deep memory retention or complex multi-step planning due to its 20,000 token sequence length limit.