Locutusque/Hercules-3.0-Mistral-7B

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:8kPublished:Feb 17, 2024License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

Hercules-3.0-Mistral-7B by Locutusque is a 7 billion parameter language model fine-tuned from Mistralai/Mistral-7B-v0.1 with an 8192 token context length. It is specifically designed for complex instruction following, function calling, and conversational interactions across scientific and technical domains. This model excels in specialized chatbots and instructional assistants, particularly in Biology, Chemistry, Physics, Mathematics, Medicine, and Computer Science.

Loading preview...

Overview

Hercules-3.0-Mistral-7B is a 7 billion parameter language model developed by Locutusque, fine-tuned from Mistralai/Mistral-7B-v0.1. It leverages the Hercules-v3.0 dataset, which expands upon OpenHermes-2.5 with numerous curated datasets, to enhance its capabilities in instruction following, function calls, and domain-specific conversations.

Key Capabilities

  • Complex Instruction Following: Accurately executes multi-step instructions, including those with specialized terminology.
  • Function Calling: Interprets and executes function calls, providing appropriate input and output values.
  • Domain-Specific Knowledge: Engages in informative conversations across scientific and technical fields such as Biology, Chemistry, Physics, Mathematics, Medicine, and Computer Science.

Training Details

The model was fine-tuned on 1,400,000 examples of the Hercules-v3.0 dataset using 8 Kaggle TPUs. It utilized a learning rate of 2e-06 with the Adam optimizer and a linear scheduler. The model was trained using the bfloat16 dtype and adheres to OpenAI's ChatML prompt format, adapted for function calling.

Intended Uses

  • Specialized Chatbots: Ideal for creating knowledgeable conversational agents in scientific and technical domains.
  • Instructional Assistants: Supports users with educational and step-by-step guidance across various disciplines.
  • Code Generation and Execution: Facilitates code execution through function calls, aiding in software development and prototyping.

Limitations and Risks

Users should be aware of potential biases from underlying data sources, the possibility of generating toxic content, and the risk of hallucinations or factual errors, especially in highly specialized domains. The developer notes that a dataset used in Hercules-v3.0 caused performance degradation and recommends using Hercules-2.5 until Hercules-3.1 is released.

Popular Sampler Settings

Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.

temperature
top_p
top_k
frequency_penalty
presence_penalty
repetition_penalty
min_p