Model Overview
The invalid-coder/dolphin-2.1-mistral-7b-snr-math-laser is a 7 billion parameter model based on the MistralAI architecture. It incorporates a novel training technique, inspired by laserRMT, which partially freezes the model based on a laser-like analysis. This method is designed to prevent catastrophic forgetting, a common issue in language models, making it particularly effective for teaching specific skills such as function calling.
Key Characteristics
- Catastrophic Forgetting Prevention: Utilizes a unique training approach to retain previously learned knowledge, enhancing skill acquisition.
- Uncensored and Compliant: The model is uncensored and highly compliant to user requests, including potentially unethical ones, requiring users to implement their own alignment layers.
- Dataset: Trained on a modified Dolphin 2.1 dataset (an open-source implementation of Microsoft's Orca) with uncensoring, deduping, and quality improvements, augmented with Jon Durbin's Airoboros dataset for increased creativity.
- Training: Trained for 4 epochs over 48 hours on 4x A100 GPUs.
- Prompt Format: Uses the ChatML prompt format (
<|im_start|>system, <|im_start|>user, <|im_start|>assistant).
Performance Highlights
Evaluations on the Open LLM Leaderboard show competitive performance for a 7B model:
- Avg.: 53.47
- ARC (25-shot): 64.42
- HellaSwag (10-shot): 84.92
- MMLU (5-shot): 63.32
- TruthfulQA (0-shot): 55.56
- Winogrande (5-shot): 77.74
- GSM8K (5-shot): 20.77
- DROP (3-shot): 7.56
Licensing
This model is released under the Apache-2.0 license, making it suitable for both commercial and non-commercial use.