Overview
Azure_Dusk-v0.2 Overview
Azure_Dusk-v0.2 is a 12 billion parameter language model developed by Epiculous, building upon the Mistral-Nemo-Base-2407 architecture. This version represents a significant advancement over its predecessor, Crimson_Dawn-v0.2, incorporating a substantially larger training dataset and leveraging RSLoRA for improved efficiency and performance during fine-tuning.
Key Enhancements & Training Details
- Base Model: Built on Mistral-Nemo-Base-2407.
- Training Methodology: Utilizes RSLoRA (as opposed to standard LoRA in previous versions) and incorporates a two-phased training approach over two epochs each on RP data and instruct data.
- Prompting Format: Trained exclusively on ChatML format, which is crucial for optimal interaction and performance.
- Hardware: Training was conducted on two NVIDIA A6000 GPUs.
Performance Metrics
Evaluations on the Open LLM Leaderboard show an average score of 14.03. Specific scores include:
- IFEval (0-Shot): 34.67
- BBH (3-Shot): 17.40
- MMLU-PRO (5-shot): 22.60
Usage and Prompting
Users should adhere strictly to the ChatML prompting structure for best results. Example:
<|im_start|>user
Hi there!<|im_end|>
<|im_start|>assistant
Nice to meet you!<|im_end|>Quantized versions (exl2, gguf) are available for broader deployment.