tartuNLP/Llama-3.1-EstLLM-8B-0525

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Dec 20, 2025License:llama3.1Architecture:Transformer Cold

The tartuNLP/Llama-3.1-EstLLM-8B-0525 is an 8 billion parameter causal language model developed by TartuNLP and TalTechNLP, continuously pre-trained from meta-llama/Llama-3.1-8B on approximately 35 billion tokens, including significant Estonian, Python, and mathematical datasets. This base model is specifically optimized for enhancing Estonian language capabilities and is intended for further fine-tuning on downstream tasks rather than direct instruction-following. It demonstrates strong performance in Estonian language benchmarks, outperforming its base model and several other 8B-class models in various Estonian tasks and translation metrics.

Loading preview...

Model Overview

The tartuNLP/Llama-3.1-EstLLM-8B-0525 is an 8 billion parameter base text completion model developed by TartuNLP and TalTechNLP, funded by the Estonian Ministry of Education and Research. It is a continued pre-training of the original meta-llama/Llama-3.1-8B model, with an additional 35 billion tokens of training data. This dataset includes the Estonian National Corpus (8.6B tokens), Python-Edu (3.3B tokens), FineMath4-Plus (9.5B tokens), General Instruction-Augmented Corpora (7.4B tokens), and Cosmopedia v2 (6.9B tokens).

Key Capabilities

  • Enhanced Estonian Language Proficiency: Significantly improved performance on various Estonian benchmarks, including belebele-et, exam-et, grammar-et, inflection-et, trivia-et, winogrande-et, xcopa-et, and GlobalPIQA-et, often surpassing its Llama 3.1 base and other comparable models.
  • Multilingual Support: While primarily focused on Estonian, it also maintains strong English language capabilities, as evidenced by its performance on belebele-en, MMLU-Redux, and winogrande benchmarks.
  • Translation Performance: Achieves competitive BLEU scores for Estonian to English and English to Estonian translation tasks.

Good For

  • Fine-tuning for Estonian NLP tasks: This model is explicitly designed as a base model for further fine-tuning on specific downstream tasks requiring strong Estonian language understanding and generation.
  • Research and Development: Ideal for researchers exploring continued pre-training techniques and developing specialized LLMs for less-resourced languages like Estonian.

Limitations

  • Base Model: It is a base text completion model and not instruction-tuned, meaning it is not suitable for direct chat or instruction-following without further fine-tuning.
  • Context Size: The continued training was performed with a sequence length of 4096 tokens, which may result in a somewhat limited effective context size compared to models trained with longer contexts.