Model Overview
Llama-3.1-EstLLM-8B-Instruct-1125 is an 8 billion parameter instruction-following causal language model developed by the TartuNLP and TalTechNLP research groups, funded by the Estonian Ministry of Education and Research. It is based on meta-llama/Llama-3.1-8B and has undergone extensive continued pre-training on approximately 35 billion tokens, followed by supervised fine-tuning (SFT) and direct preference optimization (DPO).
Key Capabilities & Training
- Bilingual Proficiency: Optimized for both Estonian and English, with continued pre-training on a diverse dataset including Estonian National Corpus, Python-Edu, FineMath4-Plus, General Instruction-Augmented Corpora, and Cosmopedia v2.
- Instruction Following: Demonstrates strong instruction-following capabilities in both Estonian (IFEval-et score of 0.6141, an improvement over its predecessor) and English (IFEval-en score of 0.8173, also an improvement).
- Language Competence: Achieves notable scores in Estonian language competence benchmarks, including Grammar-et (0.8310), Inflection-et (0.5777), and Word-Meanings-et (0.9619), showing significant improvements.
- Knowledge & Reasoning: Performs well in Estonian knowledge and reasoning tasks like Winogrande-et (0.6440) and Trivia-et (0.4288), and competitive scores in English benchmarks such as GSM8K (0.7726).
- Translation: Shows strong performance in English to Estonian translation, achieving a BLEU score of 0.2635 on wmt24pp, making it a competitive option for this specific translation direction.
Limitations
- Context Length: Has a relatively short context of 4096 tokens, and performance beyond this is not guaranteed.
- Multi-turn Conversations: While improved by merging, multi-turn conversation support is not fully guaranteed.
- Date Cut-off: Inherits the original Llama 3.1 system prompt's hard-coded date cut-off.
When to Use This Model
This model is particularly well-suited for applications requiring robust instruction-following and strong language understanding in Estonian and English, especially where high performance on Estonian-specific language competence and knowledge tasks is critical. Its translation capabilities from English to Estonian also make it valuable for relevant localization and multilingual content generation tasks.