cjvt/GaMS-9B-Instruct-Lex

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:9BQuant:FP8Ctx Length:16kPublished:Sep 24, 2025License:gemmaArchitecture:Transformer0.0K Warm

cjvt/GaMS-9B-Instruct-Lex is a 9 billion parameter instruction-tuned model developed by the University of Ljubljana, Faculty of Computer and Information Science. Based on cjvt/GaMS-9B-Instruct, it is specifically fine-tuned on a large corpus of Slovene lexicographic question-answer pairs. This model excels at providing precise and consistent answers to Slovene lexicographic queries, including definitions, synonyms, word forms, and contextual usage, with a context length of 16384 tokens. It is primarily designed to support computational lexicography, digital dictionaries, and language learners for the Slovene language.

Loading preview...

Overview

cjvt/GaMS-9B-Instruct-Lex is a specialized 9 billion parameter instruction-tuned model developed by the University of Ljubljana, Faculty of Computer and Information Science, as part of the LLM4DH project. It is built upon the cjvt/GaMS-9B-Instruct base model and has been extensively fine-tuned on a unique dataset of Slovene lexicographic question-answer pairs. This fine-tuning process involved leveraging various Slovene lexical resources like Digitalna slovarska baza (DSB), a Bridge dictionary, and the Slovene Language Advisory Service, with GPT-4.1 used for paraphrasing and diversifying auto-generated QA pairs.

Key Capabilities

  • Slovene Lexicographic Expertise: Optimized for answering specific questions related to Slovene words, including definitions, synonyms, word forms, collocations, and sense distinctions.
  • Multilingual Support: While primarily focused on Slovene, it also has secondary language support for English, Croatian, and Bosnian/Serbian.
  • Research and Learning Aid: Designed to assist in computational lexicography research, digital dictionary development, and language learning for Slovene.

Intended Use Cases

  • Answering Lexicographic Questions: Ideal for applications requiring precise answers about Slovene word meanings and usage.
  • Computational Lexicography: Supports research and development in the field of digital dictionaries and linguistic analysis.
  • Language Learning: Useful for learners and linguists seeking detailed information on Slovene vocabulary and grammar.

Limitations

  • Performance is optimized for Slovene lexicographic tasks; its effectiveness may vary in other domains or languages.
  • Potential for noise or inconsistencies due to the inclusion of automatically generated training data.
  • May occasionally produce incorrect or incomplete answers, particularly for rare or highly context-dependent queries.