trajkovnikola/MKLLM-7B-Instruct

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Jun 16, 2024License:cc-by-nc-sa-4.0Architecture:Transformer0.0K Open Weights Cold

trajkovnikola/MKLLM-7B-Instruct is a 7 billion parameter instruction-tuned language model developed by trajkovnikola, built upon Mistral-7B-v0.1. It is specifically optimized for the Macedonian language through continued pretraining on a Macedonian and English text corpus. This model excels in understanding and processing Macedonian, outperforming larger models like Llama3-8B-Instruct and Mistral-7B-Instruct-v0.3 on Macedonian benchmarks.

Loading preview...

MKLLM-7B-Instruct Overview

MKLLM-7B-Instruct is a 7 billion parameter instruction-tuned Large Language Model developed by trajkovnikola, specifically designed for the Macedonian language. It is based on the Mistral-7B-v0.1 architecture and was further pretrained on a mixed corpus of Macedonian and English text, totaling approximately 300 million tokens over two epochs. This continued pretraining has resulted in a model highly capable of understanding and processing Macedonian.

Key Capabilities and Performance

  • Macedonian Language Proficiency: Demonstrates strong capabilities in understanding and generating coherent Macedonian text.
  • Instruction Following: Instruction-tuned using the chatml format, enabling effective conversational interactions.
  • Benchmark Performance: Outperforms Meta's Llama3-8B-Instruct and Mistral's Mistral-7B-Instruct-v0.3 on Macedonian-translated benchmarks, particularly in understanding tasks. The developers also note superior generation capabilities and fluency in Macedonian.
  • Base Model: Built on the robust Mistral-7B-v0.1 foundation.

Usage and Limitations

  • Chat Template: Utilizes the chatml format for prompting, which can be applied using tokenizer.apply_chat_template().
  • Hallucination: Users should be aware that the model can hallucinate and produce factually incorrect output, especially concerning Macedonian-specific topics due to the relatively smaller training dataset for that language.