Russian Adapted Mistral-7B (ruadapt_mistral_7b_v0.1)
This model is a specialized adaptation of the mistralai/Mistral-7B-v0.1 base model, fine-tuned by rccmsu to enhance its capabilities for the Russian language. The adaptation process involved several key steps, including a complete tokenization replacement to better suit Russian linguistic structures, as detailed in the paper "Impact of Tokenization on LLaMa Russian Adaptation".
Key Adaptation & Training Details
The model underwent a unique training regimen:
- Tokenization Replacement: The original tokenizer was replaced to optimize for Russian text.
- Targeted Fine-tuning: Initial training focused on the embeddings and the language model head for 0.8 epochs on a substantial 33GB Russian dataset.
- LoRa Integration: Subsequent training utilized LoRa (Low-Rank Adaptation) to further fine-tune specific layers, including embeddings, the LM head, and linear layers/layer norms in the first and last four transformer blocks, on 1% of the data.
- Precision Handling: The model was converted to fp16 for training, with new layers merged back into the original bf16 transformer.
Performance Considerations
It is important to note that, according to the developers, the metrics of this adapted model on various datasets are slightly worse compared to the original mistralai/Mistral-7B-v0.1.
Good for:
- Applications requiring a Mistral-7B-based model with improved Russian language understanding.
- Research into the impact of tokenization and targeted fine-tuning for language adaptation.
- As a base for further fine-tuning on specific Russian-language tasks where a smaller, adapted model is preferred.