empower-dev-staging/empower-functions-small-v1-1-lc-2
The empower-dev-staging/empower-functions-small-v1-1-lc-2 is an 8 billion parameter Llama 3.1-based causal language model, fine-tuned by empower-dev-staging for enhanced function calling capabilities. This model is specifically optimized to understand and generate responses that involve tool use and function invocation, building upon the robust foundation of Meta's Llama 3.1-8B-Instruct. It is designed for applications requiring precise interaction with external tools and APIs, offering a 32768 token context length for complex function calling scenarios.
Loading preview...
Model Overview
The empower-dev-staging/empower-functions-small-v1-1-lc-2 is an 8 billion parameter language model, fine-tuned from meta-llama/Meta-Llama-3.1-8B-Instruct. This model specializes in function calling, leveraging a diverse dataset to improve its ability to identify and utilize external tools or APIs based on user prompts. It was trained using Axolotl, with a focus on enhancing its conversational capabilities for tool-use scenarios.
Key Capabilities
- Function Calling: Optimized for understanding when and how to invoke functions, making it suitable for applications requiring interaction with external systems.
- Llama 3.1 Base: Benefits from the strong foundational capabilities of the Meta Llama 3.1-8B-Instruct model, including its reasoning and instruction-following abilities.
- Context Length: Supports a sequence length of 4096 tokens, enabling the processing of moderately complex function calling instructions and conversational history.
- Efficient Training: Fine-tuned with LoRA (r=16, alpha=32) and Flash Attention, ensuring efficient adaptation while maintaining performance.
Training Details
The model was trained for one epoch with a learning rate of 0.0002, utilizing an Adam optimizer and a cosine learning rate scheduler. The training involved a mix of datasets, including those focused on function usage, non-usage, and parallel data, all formatted for Llama-3 conversation. The final validation loss achieved was 0.0968, indicating effective learning during the fine-tuning process.