Overview
Hercules-3.0-Mistral-7B is a 7 billion parameter language model developed by Locutusque, fine-tuned from Mistralai/Mistral-7B-v0.1. It leverages the Hercules-v3.0 dataset, which expands upon OpenHermes-2.5 with numerous curated datasets, to enhance its capabilities in instruction following, function calls, and domain-specific conversations.
Key Capabilities
- Complex Instruction Following: Accurately executes multi-step instructions, including those with specialized terminology.
- Function Calling: Interprets and executes function calls, providing appropriate input and output values.
- Domain-Specific Knowledge: Engages in informative conversations across scientific and technical fields such as Biology, Chemistry, Physics, Mathematics, Medicine, and Computer Science.
Training Details
The model was fine-tuned on 1,400,000 examples of the Hercules-v3.0 dataset using 8 Kaggle TPUs. It utilized a learning rate of 2e-06 with the Adam optimizer and a linear scheduler. The model was trained using the bfloat16 dtype and adheres to OpenAI's ChatML prompt format, adapted for function calling.
Intended Uses
- Specialized Chatbots: Ideal for creating knowledgeable conversational agents in scientific and technical domains.
- Instructional Assistants: Supports users with educational and step-by-step guidance across various disciplines.
- Code Generation and Execution: Facilitates code execution through function calls, aiding in software development and prototyping.
Limitations and Risks
Users should be aware of potential biases from underlying data sources, the possibility of generating toxic content, and the risk of hallucinations or factual errors, especially in highly specialized domains. The developer notes that a dataset used in Hercules-v3.0 caused performance degradation and recommends using Hercules-2.5 until Hercules-3.1 is released.