Name: stanfordnlp/llama8b-nnetnav-live API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: stanfordnlp

Model Overview

The stanfordnlp/llama8b-nnetnav-live is an 8 billion parameter web agent model, built upon the Llama-3.1-8B architecture and instruction-tuned by Stanford NLP. Its core capability lies in executing web actions based on natural language instructions, effectively acting as a browser agent. The model was trained using the NNetNav-Live dataset, which comprises approximately 5,000 synthetic demonstrations collected through unsupervised exploration on 15 live websites, including major platforms like Amazon, Google, and GitHub.

Key Capabilities

Web Task Automation: Designed to perform a sequence of actions on live websites given a high-level instruction (e.g., "Upvote the post by user smurty123 on subreddit r/LocalLLaMA").
Comprehensive Action Space: Supports a wide range of browser operations including click, type, hover, press key combinations, scroll, new_tab, tab_focus, close_tab, goto URL, go_back, go_forward, and stop with an answer.
Performance: Achieves a 9.5% success rate on WebArena and 35.2% on WebVoyager benchmarks, demonstrating competitive performance in web navigation tasks.

Use Cases and Limitations

This model is particularly well-suited for automating repetitive or complex web interactions. However, users should be aware of its limitations:

Bias: Inherits biases from its training data, potentially struggling with out-of-domain websites (e.g., government sites) or non-English/culturally distinct platforms.
Risks: Prone to unintended actions on ambiguous websites, security/privacy risks (e.g., credential exposure), and adversarial manipulation by malicious websites using dark patterns.
Generalization: May struggle with websites significantly different from the 15 sites it was trained on. For distinct domains, training a custom NNetNav model is recommended.
Instruction Sensitivity: Vague instructions can lead to unintended or suboptimal actions.
Long-Horizon Tasks: Performance may degrade on tasks requiring deep memory retention or complex multi-step planning due to its 20,000 token sequence length limit.

Overview

Model Overview

Key Capabilities

Use Cases and Limitations

Full Model Card (README)