Model Overview
Locutusque/Llama-3-Hercules-5.0-8B is an 8 billion parameter language model, fine-tuned from the Llama-3-8B base model. It is specifically designed to enhance instruction following, function calling capabilities, and conversational fluency within various scientific and technical disciplines. The model was trained on 8 Kaggle TPUs using torch xla SPMD for high MXU efficiency, utilizing a learning rate of 2e-5 with the Adam optimizer and a linear scheduler.
Key Capabilities
- Complex Instruction Following: Accurately interprets and executes multi-step instructions, including those with specialized terminology.
- Function Calling: Seamlessly processes and executes function calls, managing appropriate input and output values.
- Domain-Specific Knowledge: Engages in informative conversations across Biology, Chemistry, Physics, Mathematics, Medicine, and Computer Science.
- Code Generation and Execution: Facilitates code execution via function calls, supporting software development and prototyping.
Intended Uses
This model is well-suited for applications requiring deep technical understanding and precise execution:
- Specialized Chatbots: Ideal for creating knowledgeable conversational agents in scientific and technical fields.
- Instructional Assistants: Supports users with educational and step-by-step guidance across various disciplines.
Training Details
The model was trained for 2 epochs on all examples of Hercules-v5.0, using a total batch size of 128 and bfloat16 precision. It adheres to OpenAI's ChatML prompt format, with a slightly modified structure to accommodate its function calling capabilities. The training leveraged the TPU-Alignment repository by Locutusque.
Limitations
Users should be aware of potential limitations, including the presence of toxic or harmful examples in underlying datasets, the possibility of hallucinations and factual errors, and potential misuse due to its technical capabilities.