GaMS-9B-Instruct: A Multilingual Gemma 2-based LLM
The cjvt/GaMS-9B-Instruct model is a 9 billion parameter instruction-tuned language model, part of the larger GaMS (Generative Model for Slovene) family developed by researchers at the University of Ljubljana. It is built upon Google's Gemma 2 architecture and has undergone extensive continual pre-training on a diverse corpus including Slovene, English, Croatian, Bosnian, and Serbian data, making it particularly adept at handling these languages.
Key Capabilities
- Multilingual Proficiency: Strong performance in Slovene, English, Croatian, Bosnian, and Serbian, with potential for other Gemma 2-supported languages.
- Instruction Following: Fine-tuned for general instruction-following tasks, enabling conversational AI and response generation.
- Robust Training: Continually pre-trained in two stages, including parallel alignment of English-Slovene/Croatian corpora and subsequent training on large, separate language corpora (13.62 billion tokens total).
- Supervised Fine-tuning (SFT): Trained on approximately 25,000 SFT examples from various datasets, including specialized Slovene instruction datasets and filtered parallel corpora.
Evaluation Highlights
- Slovenian-LLM-Eval: Demonstrates competitive performance against other models, including base Gemma 2 and SlovenianGPT.
- SloBench SuperGLUE: Achieves a 0.6997 average on Slovene SuperGLUE tasks in a 0-shot scenario.
- Translation Tasks: Ranks highly in English-to-Slovene and Slovene-to-English translation benchmarks, outperforming several other models in its class.
Intended Use Cases
- Content Creation: Generating text in supported languages, including creative formats.
- Conversational AI: Powering chatbots and virtual assistants, especially for multilingual applications.
- Research and Education: Serving as a foundation for NLP research and language learning tools focused on the specified languages.