tartuNLP/Llammas

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Nov 29, 2023License:llama2Architecture:Transformer0.0K Open Weights Cold

Llammas is a 7 billion parameter Llama-2-based instruction-tuned language model developed by tartuNLP, specifically optimized for the Estonian language. It underwent continued pre-training on 5 billion tokens from CulturaX, with 75% Estonian and 25% English documents, followed by instruction-tuning using a mix of English and Estonian datasets including Alpaca-est. This model is the first open-source instruction-following LLM for Estonian, excelling in tasks requiring understanding and generation in the language.

Loading preview...

Llammas: Estonian-Optimized Llama-2 Instruction Model

Llammas is a 7 billion parameter instruction-tuned model based on Llama-2, developed by tartuNLP with a primary focus on the Estonian language. It represents the first open-source instruction-following Large Language Model specifically designed for Estonian.

Key Capabilities & Training

  • Bilingual Pre-training: The model underwent continued pre-training on 5 billion tokens from the CulturaX dataset, with a significant 75% of documents in Estonian and 25% in English.
  • Instruction-Tuning: It was instruction-tuned using a diverse set of datasets, including Alpaca-cleaned, Alpaca-est (an Estonian instruction dataset generated with gpt-3.5-turbo-0613), OASST1 top-1 English conversations, CoT, and FLAN-V2, alongside WMT18 English-Estonian translation data.
  • Cross-Lingual Knowledge Transfer: Research indicates that this approach significantly enhances performance on Estonian tasks by leveraging cross-lingual instruction-tuning and additional monolingual pre-training.
  • Commonsense Reasoning & Multi-turn Conversations: The model demonstrates improved capabilities in commonsense reasoning and multi-turn conversations in Estonian, transferred from high-quality English instructions.

Use Cases & Resources

Llammas is ideal for applications requiring robust language understanding and generation in Estonian. It is particularly suited for tasks that benefit from instruction-following capabilities. For those interested in its development and performance, tartuNLP has published a paper detailing its creation and evaluation. The associated Alpaca-est dataset is also publicly available.