Hercules-5.0-Qwen2-1.5B Overview
M4-ai/Hercules-5.0-Qwen2-1.5B is a 1.5 billion parameter language model developed by M4-ai, fine-tuned from the Qwen2-1.5B architecture. It is designed as a general-purpose assistant, leveraging a high-quality mixed dataset for its fine-tuning process. The model uses the ChatML prompt format.
Key Capabilities
- General-purpose assistance: Excels in a broad range of tasks.
- Mathematical reasoning: Demonstrates capabilities in math-related problems.
- Code generation: Proficient in coding tasks.
- Writing assistance: Capable of generating and assisting with written content.
- Question Answering: Effective for direct question answering.
- Chain-of-Thought: Supports complex reasoning processes.
Training Details
The model was fine-tuned using the Locutusque/hercules-v5.0 dataset. Training was conducted using bf16 non-mixed precision on 8 Kaggle TPUs, with a global batch size of 256 and a sequence length of 1536. The developers plan to release a DPO (Direct Preference Optimization) version in the future.
Licensing and Language
Hercules-5.0-Qwen2-1.5B is released under the Apache-2.0 license. Its primary language is English, with potential capabilities in Chinese.