Hercules-Qwen1.5-14B Overview
Hercules-Qwen1.5-14B is a 14.2 billion parameter language model developed by M4-ai, building upon the Qwen1.5-14B architecture. This model has been extensively fine-tuned using 700,000 examples from the Hercules-v4 dataset, enhancing its capabilities across various domains.
Key Capabilities
- Mathematical Reasoning: Demonstrates proficiency in solving mathematical problems.
- Code Generation: Capable of generating and understanding code.
- Function Calling: Supports function calling mechanisms for integration with external tools.
- Roleplay: Excels in conversational roleplay scenarios.
- General Purpose Assistant: Functions effectively as a versatile assistant for diverse queries.
- Question Answering: Provides accurate answers to a wide range of questions.
- Chain-of-Thought: Supports complex reasoning through chain-of-thought processes.
Training Details
The model was fine-tuned using the Hercules-v4.0 dataset with a bf16 non-mixed precision training regime. The training utilized 8 Kaggle TPUs, with a global batch size of 128 and a sequence length of 1024.
Good For
This model is suitable for developers and researchers looking for a robust, general-purpose language model with strong performance in specialized areas like coding, math, and function calling. Its broad fine-tuning makes it adaptable for various assistant-like applications and complex reasoning tasks.