tartuNLP/Llama-3.1-EstLLM-8B-Instruct-0825

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Jan 20, 2026License:llama3.1Architecture:Transformer0.0K Cold

Llama-3.1-EstLLM-8B-Instruct-0825 is an 8 billion parameter instruction-following causal language model developed by TartuNLP and TalTechNLP. It is continuously pre-trained from Meta's Llama-3.1-8B, with a focus on enhancing Estonian language capabilities through additional pre-training on 35B tokens and subsequent fine-tuning. This model excels in Estonian language tasks, particularly instruction-following and multiple-choice questions, and supports both Estonian and English.

Loading preview...

Llama-3.1-EstLLM-8B-Instruct-0825: Estonian-Enhanced LLM

This model is the first prototype from the EstLLM project, developed by TartuNLP and TalTechNLP, and funded by the Estonian Ministry of Education and Research. It is an 8 billion parameter instruction-following model, continuously pre-trained from Meta's Llama-3.1-8B, with a context length of 32768 tokens.

Key Capabilities & Training

  • Bilingual Support: Optimized for both Estonian and English, with a strong focus on Estonian language competence.
  • Continuous Pre-training: Underwent continuous pre-training on approximately 35 billion tokens, including significant Estonian National Corpus data (8.6B tokens), Python-Edu, FineMath4-Plus, and general instruction-augmented corpora.
  • Fine-tuning: Utilizes Supervised Fine-Tuning (SFT) with 764k examples, primarily from Tulu 3 SFT mixture and EuroBlocks-SFT-Synthetic, with about 80% of examples in English. Direct Preference Optimization (DPO) was applied using HelpSteer3.
  • Instruction Following: Demonstrates competitive performance in Estonian instruction-following (IFEval-et) and strong results in Estonian multiple-choice tasks like Word-Meanings-et and Trivia-et.
  • Translation: Shows promising BLEU scores for English to Estonian translation on the wmt24pp benchmark.

Limitations

As an early prototype, it has a relatively short effective context of 4096 tokens, and multi-turn conversations are not officially supported in this version. It also inherits the base Llama 3.1 system prompt's hard-coded date cut-off.

Good for

  • Developing applications requiring strong Estonian language understanding and generation.
  • Research and development in multilingual LLMs, particularly for low-resource languages.
  • Establishing baselines for future improvements in Estonian language models.