Epiculous/Crimson_Dawn-v0.2

Warm
Public
12B
FP8
32768
1
Sep 2, 2024
License: apache-2.0
Hugging Face

Crimson_Dawn-v0.2 is a 12 billion parameter language model developed by Epiculous, based on Mistral-Nemo-Base-2407, and fine-tuned with RSLoRA. This model features a 32768 token context length and is specifically trained using the ChatML format for conversational and instructional tasks. It incorporates significantly more training data than its predecessor, focusing on improved performance in instruction following.

Overview

Crimson_Dawn-v0.2 Overview

Epiculous's Crimson_Dawn-v0.2 is a 12 billion parameter language model built upon the Mistral-Nemo-Base-2407 architecture. This iteration significantly expands on the training methodology of v0.1 by incorporating substantially more data and utilizing RSLoRA for fine-tuning, a key departure from the previous regular LoRA approach. A notable change is its training on the ChatML format, moving away from Mistral Formatting, which dictates its prompting structure.

Key Capabilities & Training

  • Base Model: Derived from Mistral-Nemo-Base-2407.
  • Training Methodology: Employs RSLoRA for fine-tuning, with a two-phased approach over two epochs each on RP data and then instruct data.
  • Prompting Format: Optimized for ChatML, requiring specific <|im_start|>user and <|im_end|> tags for effective interaction.
  • Context Length: Supports a substantial context window of 32768 tokens.

Performance Metrics

Evaluations on the Open LLM Leaderboard show an average score of 14.82. Specific metrics include:

  • IFEval (0-Shot): 31.03
  • BBH (3-Shot): 21.69
  • MMLU-PRO (5-shot): 19.10

Use Cases

Crimson_Dawn-v0.2 is well-suited for applications requiring instruction-following and conversational AI, particularly where adherence to the ChatML format is feasible. Its enhanced training data and RSLoRA fine-tuning aim to improve its ability to understand and generate responses in a structured chat environment.