QM-4B: Qarachay-Malqar Language Support
QM-4B is a 4-billion parameter language model developed by TSjB, built upon the Qwen3-4B-Instruct-2507 architecture. Its primary differentiator is its specialized fine-tuning and tokenizer expansion to provide comprehensive support for the Qarachay-Malqar language (къарачай-малкъар тил), alongside strong capabilities in Russian and English.
Key Capabilities
- Enhanced Qarachay-Malqar Support: Features an expanded tokenizer with significantly increased representation for Qarachay-Malqar symbols, improving linguistic accuracy and fluency.
- Multilingual Generation: Capable of generating text in Qarachay-Malqar, Russian, and English, with support for other languages from the base Qwen3 model.
- Extended Context Length: Offers a substantial context window of 40960 tokens, allowing for processing longer inputs and maintaining coherence over extended conversations or documents.
- Optimized Training: Underwent a multi-stage training process including tokenizer expansion, embeddings-only training, and full fine-tuning of all model layers.
Good for
- Qarachay-Malqar Language Applications: Ideal for chatbots, content generation, translation, and research focused on the Qarachay-Malqar language.
- Multilingual Communication: Useful in scenarios requiring interaction across Qarachay-Malqar, Russian, and English.
- Text Generation: Suited for tasks involving creative writing, summarization, and question-answering in its supported languages.
Limitations
- The model was fine-tuned on text data (continued pretraining) rather than dialogues, which may affect its conversational abilities.
- It may occasionally switch between languages within a single response.
- Additional instruction tuning is recommended for improved instruction following.