tartuNLP/llama-estllm-prototype-0825

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Sep 1, 2025License:llama3.1Architecture:Transformer0.0K Cold

The tartuNLP/llama-estllm-prototype-0825 is an 8 billion parameter instruction-following causal language model developed by TartuNLP and TalTechNLP. It is a prototype version of EstLLM, continuously pre-trained on 35B tokens from Llama-3.1-8B, with a focus on enhancing Estonian language capabilities. This model excels in Estonian language competence and instruction-following tasks, serving as a baseline for future improvements in multilingual LLMs.

Loading preview...

EstLLM Prototype 0825 Instruct: An Estonian-Enhanced LLM

This model, developed by TartuNLP and TalTechNLP, is the initial prototype from the EstLLM project, designed to establish a baseline for improving Estonian language capabilities in large language models. It is based on Meta's Llama-3.1-8B, having undergone continuous pre-training on approximately 35 billion tokens, followed by supervised fine-tuning and direct preference optimization.

Key Capabilities & Features

  • Bilingual Support: Optimized for both Estonian and English, with a strong focus on Estonian language competence.
  • Instruction Following: Demonstrates solid performance in instruction-following tasks for both Estonian (IFEval-et) and English (IFEval-en).
  • Estonian Language Competence: Shows competitive results in Estonian grammar, inflection, and word meanings, achieving a notable 0.9569 on Word-Meanings-et.
  • Knowledge & Reasoning: Performs well in Estonian knowledge and reasoning benchmarks like Winogrande-et and Trivia-et.
  • Translation: Achieves a BLEU score of 0.264 for English to Estonian translation on the wmt24pp dataset, outperforming several comparable models.

Good for

  • Estonian Language Applications: Ideal for developing chatbots, language tools, and other applications requiring strong Estonian language understanding and generation.
  • Baseline Evaluation: Useful for researchers and developers looking to evaluate and build upon a foundational Estonian-enhanced LLM.
  • Multilingual Research: Provides a valuable resource for studying the impact of continued pre-training and fine-tuning on low-resource languages within multilingual contexts.

Limitations

As an early prototype, it has a relatively short context length of 4096 tokens and does not yet support multi-turn conversations. It also inherits the hard-coded date cut-off from the original Llama 3.1 system prompt.