Epiculous/Azure_Dusk-v0.2

Warm
Public
12B
FP8
32768
License: apache-2.0
Hugging Face
Overview

Azure_Dusk-v0.2 Overview

Azure_Dusk-v0.2 is a 12 billion parameter language model developed by Epiculous, building upon the Mistral-Nemo-Base-2407 architecture. This version represents a significant advancement over its predecessor, Crimson_Dawn-v0.2, incorporating a substantially larger training dataset and leveraging RSLoRA for improved efficiency and performance during fine-tuning.

Key Enhancements & Training Details

  • Base Model: Built on Mistral-Nemo-Base-2407.
  • Training Methodology: Utilizes RSLoRA (as opposed to standard LoRA in previous versions) and incorporates a two-phased training approach over two epochs each on RP data and instruct data.
  • Prompting Format: Trained exclusively on ChatML format, which is crucial for optimal interaction and performance.
  • Hardware: Training was conducted on two NVIDIA A6000 GPUs.

Performance Metrics

Evaluations on the Open LLM Leaderboard show an average score of 14.03. Specific scores include:

  • IFEval (0-Shot): 34.67
  • BBH (3-Shot): 17.40
  • MMLU-PRO (5-shot): 22.60

Usage and Prompting

Users should adhere strictly to the ChatML prompting structure for best results. Example:

<|im_start|>user
Hi there!<|im_end|>
<|im_start|>assistant
Nice to meet you!<|im_end|>

Quantized versions (exl2, gguf) are available for broader deployment.