Locutusque/Hercules-4.0-Yi-34B

TEXT GENERATIONConcurrency Cost:2Model Size:34BQuant:FP8Ctx Length:32kPublished:Apr 2, 2024License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

Hercules-4.0-Yi-34B is a 34 billion parameter language model fine-tuned from the Yi-34B architecture by Locutusque. It specializes in complex instruction following, function calls, and conversational interactions across scientific and technical domains. This model excels in subjects like Biology, Chemistry, Physics, Mathematics, Medicine, and Computer Science, making it suitable for specialized chatbots and instructional assistants.

Loading preview...

Hercules-4.0-Yi-34B: Specialized Instruction Following and Function Calling

Hercule-4.0-Yi-34B is a 34 billion parameter language model, fine-tuned from the Yi-34B base model by Locutusque. It is specifically designed to enhance performance in complex instruction following, function calling, and engaging in detailed conversations within scientific and technical fields. The model's training utilized the Hercules-v4.0 dataset, which expands upon OpenHermes-2.5 with additional curated data.

Key Capabilities

  • Complex Instruction Following: Accurately executes multi-step instructions, including those with specialized terminology.
  • Function Calling: Interprets and executes function calls, providing appropriate input and output values.
  • Domain-Specific Knowledge: Engages in informative discussions across Biology, Chemistry, Physics, Mathematics, Medicine, and Computer Science.

Intended Uses

  • Specialized Chatbots: Ideal for creating knowledgeable conversational agents in scientific and technical domains.
  • Instructional Assistants: Supports users with educational and step-by-step guidance.
  • Code Generation and Execution: Facilitates code execution through function calls, aiding in software development.

Training Details

The model was fine-tuned on 75,000 examples of the Hercules-v4.0 dataset using 8 Kaggle TPUs. It employed a learning rate of 1e-4 with bfloat16 precision and a total batch size of 64. LoRA was used to freeze approximately 97% of the model parameters. The model is trained to use OpenAI's ChatML prompt format, adapted for function calling capabilities.

Limitations

Users should be aware of potential biases from underlying data sources, the risk of hallucinations or factual errors, and the possibility of misuse due due to its technical conversation and function call abilities.