ModelScope Llama3-8B-Agent-Instruct-V2 Overview
This model is a specialized 8 billion parameter Llama 3 instruction-tuned variant, developed by ModelScope, focusing on enhancing agentic capabilities. It was fine-tuned using the comprehensive MSAgent-Pro dataset and incorporates a loss_scale technique for improved training efficiency and performance.
Key Capabilities & Performance
The llama3-8b-agent-instruct-v2 model significantly outperforms the base llama3-8b-instruct model on the challenging ToolBench evaluation set, both in-domain and out-of-domain. Key improvements include:
- Enhanced Planning (Plan.EM): Achieves 85.15% (in-domain) and 85.79% (out-of-domain), a substantial increase from the base model's ~70%.
- Superior Action Execution (Act.EM): Reaches 58.1% (in-domain) and 59.43% (out-of-domain), nearly doubling the base model's performance.
- Reduced Hallucination Rate: Drastically lowers hallucination to 1.57% (in-domain) and 2.56% (out-of-domain), indicating higher reliability in tool-use scenarios.
- Improved Overall F1 Score: Demonstrates an Avg.F1 of 52.10% (in-domain) and 52.19% (out-of-domain).
These metrics highlight its robust ability to understand, plan, and execute complex tasks requiring external tool interaction, with a strong emphasis on accuracy and reduced errors.
Ideal Use Cases
- Agentic AI Systems: Building intelligent agents that can interact with various tools and APIs.
- Automated Workflow Execution: Tasks requiring sequential decision-making and tool invocation.
- Complex Problem Solving: Scenarios where models need to break down problems and use external resources to find solutions.