watt-ai/watt-tool-8B

Warm
Public
8B
FP8
32768
Dec 19, 2024
License: apache-2.0
Hugging Face
Overview

Overview

watt-tool-8B is an 8 billion parameter language model, fine-tuned from LLaMa-3.1-8B-Instruct, with a context length of 32768 tokens. Its primary optimization is for complex tool usage and multi-turn dialogue scenarios. The model achieves state-of-the-art performance on the Berkeley Function-Calling Leaderboard (BFCL), highlighting its proficiency in function calling and tool orchestration.

Key Capabilities

  • Enhanced Tool Usage: Precisely selects and executes tools based on user requests.
  • Multi-Turn Dialogue: Maintains context and effectively utilizes tools across multiple conversational turns, enabling the completion of more intricate tasks.
  • High Performance: Demonstrated top-tier results on the BFCL, validating its capabilities in agentic workflows.

Training Methodology

The model was trained using supervised fine-tuning on a specialized dataset tailored for tool usage and multi-turn interactions. It incorporates CoT (Chain-of-Thought) techniques to synthesize high-quality multi-turn dialogue data. The training process is inspired by principles from the paper "Direct Multi-Turn Preference Optimization for Language Agents" and utilizes SFT (Supervised Fine-Tuning) and DMPO (Direct Multi-Turn Preference Optimization) to boost performance in multi-turn agent tasks.

Good For

  • Developing AI workflow building tools, similar to platforms like Lupan and Coze.
  • Applications requiring robust function calling and tool execution.
  • Building conversational agents that need to interact with external tools over extended dialogues.