NexaAI/Octopus-v2

TEXT GENERATIONConcurrency Cost:1Model Size:2BQuant:BF16Ctx Length:32kPublished:Mar 16, 2024License:cc-by-nc-4.0Architecture:Transformer0.9K Open Weights Cold

NexaAI/Octopus-v2 is a 2 billion parameter language model developed by Nexa AI, specifically engineered for efficient on-device function calling. It utilizes a unique functional token strategy for training and inference, enabling high accuracy and significantly faster inference speeds compared to RAG-based methods and even GPT-4. This model excels at generating individual, nested, and parallel function calls, making it ideal for Android API orchestration and edge computing applications.

Loading preview...

Octopus-v2: On-Device Function Calling Language Model

Nexa AI's Octopus-v2 is a 2 billion parameter language model designed for highly efficient on-device function calling, particularly for Android APIs. It introduces a novel functional token strategy that optimizes both training and inference, allowing it to achieve performance comparable to larger models like GPT-4 while operating at significantly higher speeds.

Key Capabilities:

  • Exceptional Inference Speed: Outperforms "Llama7B + RAG solution" by 36X on an A100 GPU and is 168% faster than GPT-4-turbo, attributed to its functional token design.
  • High Function Call Accuracy: Achieves 98-100% accuracy, surpassing "Llama7B + RAG solution" by 31% and matching GPT-4 and RAG + GPT-3.5.
  • Versatile Function Calling: Capable of generating individual, nested, and parallel function calls across complex scenarios.
  • On-Device Optimization: Engineered for seamless operation on Android devices, supporting applications from system management to multi-device orchestration.

Good For:

  • Developers building AI agents for edge computing and Android applications requiring fast and accurate function calling.
  • Use cases where efficient execution of Android APIs is critical, such as smart device control or specialized mobile applications.
  • Scenarios demanding high function call accuracy with minimal latency on resource-constrained devices.