Name: stanfordnlp/llama8b-nnetnav-wa API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: stanfordnlp

Llama8b-NNetNav-WA: An Instruct-Tuned Web-Agent Model

stanfordnlp/llama8b-nnetnav-wa is an 8 billion parameter model based on Llama-3.1-8B, specifically instruct-tuned by Stanford NLP to function as a web-agent. This model is designed to perform complex web interactions based on natural language instructions, enabling it to navigate websites and execute actions like a human user. It leverages the NNetNav-WA dataset, which consists of synthetic demonstrations collected via unsupervised exploration on WebArena websites.

Key Capabilities

Web Automation: Executes a sequence of actions (click, type, hover, scroll, tab management, URL navigation) to complete tasks on websites.
Instruction Following: Interprets natural language instructions to perform specific web-based objectives, such as "Upvote the post by user smurty123 on subreddit r/LocalLLaMA."
Action Space: Supports a comprehensive set of page operations, tab management, and URL navigation actions, including click [id], type [id] [content], goto [url], and stop [answer].

Performance Highlights

Achieves a 16.3% Success Rate (SR) on the WebArena benchmark, outperforming GPT-4's 14.1% SR in this specific environment.
Attains a 28.1% SR on WebVoyager.

Good For

Controlled Web Interaction: Ideal for automating tasks on websites with structures similar to those found in the WebArena dataset (e.g., Reddit, GitHub, e-commerce, CMS sites).
Research & Development: Useful for exploring and developing browser agents, particularly for tasks requiring precise action execution based on textual observations.
Synthetic Environments: Best suited for applications within self-hosted or controlled web environments where the model's biases from its training data are less impactful.

Limitations

Generalization: May struggle with modern layouts and diverse structures of real, live websites due to training on self-hosted WebArena sites. For live website performance, consider LLama8b-NNetNav-Live.
Bias: Inherits biases from its training data, potentially performing worse on non-English or culturally distinct websites.
Risks: Prone to unintended actions, security/privacy risks (e.g., credential leaks), and adversarial manipulation by dark patterns on malicious websites.
Long-Horizon Tasks: May struggle with tasks requiring deep memory retention, complex multi-step planning, or very long continuous web interactions due to its 20k token sequence length limit.

Overview

Llama8b-NNetNav-WA: An Instruct-Tuned Web-Agent Model

Key Capabilities

Performance Highlights

Good For

Limitations

Full Model Card (README)