M4-ai/Hercules-Qwen1.5-14B

Cold
Public
14.2B
FP8
32768
1
Mar 30, 2024
License: other
Hugging Face
Overview

Hercules-Qwen1.5-14B Overview

Hercules-Qwen1.5-14B is a 14.2 billion parameter language model developed by M4-ai, building upon the Qwen1.5-14B architecture. This model has been extensively fine-tuned using 700,000 examples from the Hercules-v4 dataset, enhancing its capabilities across various domains.

Key Capabilities

  • Mathematical Reasoning: Demonstrates proficiency in solving mathematical problems.
  • Code Generation: Capable of generating and understanding code.
  • Function Calling: Supports function calling mechanisms for integration with external tools.
  • Roleplay: Excels in conversational roleplay scenarios.
  • General Purpose Assistant: Functions effectively as a versatile assistant for diverse queries.
  • Question Answering: Provides accurate answers to a wide range of questions.
  • Chain-of-Thought: Supports complex reasoning through chain-of-thought processes.

Training Details

The model was fine-tuned using the Hercules-v4.0 dataset with a bf16 non-mixed precision training regime. The training utilized 8 Kaggle TPUs, with a global batch size of 128 and a sequence length of 1024.

Good For

This model is suitable for developers and researchers looking for a robust, general-purpose language model with strong performance in specialized areas like coding, math, and function calling. Its broad fine-tuning makes it adaptable for various assistant-like applications and complex reasoning tasks.