tartuNLP/Llama-3.1-EstLLM-8B-Instruct-1125

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Nov 28, 2025License:llama3.1Architecture:Transformer0.0K Warm

The tartuNLP/Llama-3.1-EstLLM-8B-Instruct-1125 is an 8 billion parameter instruction-following causal language model developed by TartuNLP and TalTechNLP research groups. Built upon Meta's Llama-3.1-8B, it underwent continuous pre-training on 35B tokens and subsequent supervised fine-tuning and direct preference optimization. This model is specifically optimized for strong performance in both Estonian and English, excelling in instruction-following and language competence tasks across both languages.

Loading preview...

Model Overview

Llama-3.1-EstLLM-8B-Instruct-1125 is an 8 billion parameter instruction-following causal language model developed by the TartuNLP and TalTechNLP research groups, funded by the Estonian Ministry of Education and Research. It is based on meta-llama/Llama-3.1-8B and has undergone extensive continued pre-training on approximately 35 billion tokens, followed by supervised fine-tuning (SFT) and direct preference optimization (DPO).

Key Capabilities & Training

  • Bilingual Proficiency: Optimized for both Estonian and English, with continued pre-training on a diverse dataset including Estonian National Corpus, Python-Edu, FineMath4-Plus, General Instruction-Augmented Corpora, and Cosmopedia v2.
  • Instruction Following: Demonstrates strong instruction-following capabilities in both Estonian (IFEval-et score of 0.6141, an improvement over its predecessor) and English (IFEval-en score of 0.8173, also an improvement).
  • Language Competence: Achieves notable scores in Estonian language competence benchmarks, including Grammar-et (0.8310), Inflection-et (0.5777), and Word-Meanings-et (0.9619), showing significant improvements.
  • Knowledge & Reasoning: Performs well in Estonian knowledge and reasoning tasks like Winogrande-et (0.6440) and Trivia-et (0.4288), and competitive scores in English benchmarks such as GSM8K (0.7726).
  • Translation: Shows strong performance in English to Estonian translation, achieving a BLEU score of 0.2635 on wmt24pp, making it a competitive option for this specific translation direction.

Limitations

  • Context Length: Has a relatively short context of 4096 tokens, and performance beyond this is not guaranteed.
  • Multi-turn Conversations: While improved by merging, multi-turn conversation support is not fully guaranteed.
  • Date Cut-off: Inherits the original Llama 3.1 system prompt's hard-coded date cut-off.

When to Use This Model

This model is particularly well-suited for applications requiring robust instruction-following and strong language understanding in Estonian and English, especially where high performance on Estonian-specific language competence and knowledge tasks is critical. Its translation capabilities from English to Estonian also make it valuable for relevant localization and multilingual content generation tasks.

Popular Sampler Settings

Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.

temperature
top_p
top_k
frequency_penalty
presence_penalty
repetition_penalty
min_p