mzbac/llama-3-8B-Instruct-function-calling

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:8kPublished:Apr 27, 2024License:llama3Architecture:Transformer0.0K Cold

The mzbac/llama-3-8B-Instruct-function-calling model is a fine-tuned variant of Meta-Llama/Meta-Llama-3-8B-Instruct, specifically optimized for function calling capabilities. Developed by mzbac, this model excels at accurately parsing and generating structured function calls based on user prompts and provided tool definitions. It addresses common issues found in function-calling datasets by being re-trained on cleaned data, making it a reliable choice for integrating external tools and APIs into LLM applications.

Loading preview...

Overview

This model is a specialized fine-tune of the Meta-Llama-3-8B-Instruct base model, developed by mzbac. Its primary focus is to enhance function calling capabilities, allowing it to reliably interact with external tools and APIs. The model was re-trained on a cleaned version of the glaive-function-calling-v2 dataset to mitigate issues like invalid JSON and incorrect argument formatting, which were present in the original dataset.

Key Capabilities

  • Reliable Function Calling: Designed to accurately parse user requests and generate structured JSON function calls, including tool names and arguments.
  • Improved Data Quality: Benefits from training on a meticulously cleaned dataset, reducing common errors in function call generation.
  • Integration with Tools: Facilitates seamless integration of large language models with external functions, enabling dynamic and interactive applications.

Training Details

The model was fine-tuned using LoRA (Low-Rank Adaptation) with specific hyperparameters:

  • Base Model: meta-llama/Meta-Llama-3-8B-Instruct
  • LoRA Layers: 32 layers were fine-tuned.
  • Iterations: Trained for 6000 iterations with a batch size of 1.
  • Max Sequence Length: Supports a maximum sequence length of 8192 tokens.
  • LoRA Parameters: Rank of 128, alpha of 256, and a scale of 10.0, applied to key attention and MLP projections.