Name: tartuNLP/Llama-3.1-EstLLM-8B-Instruct-1125 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: tartuNLP

Model Overview

Llama-3.1-EstLLM-8B-Instruct-1125 is an 8 billion parameter instruction-following causal language model developed by the TartuNLP and TalTechNLP research groups, funded by the Estonian Ministry of Education and Research. It is based on meta-llama/Llama-3.1-8B and has undergone extensive continued pre-training on approximately 35 billion tokens, followed by supervised fine-tuning (SFT) and direct preference optimization (DPO).

Key Capabilities & Training

Bilingual Proficiency: Optimized for both Estonian and English, with continued pre-training on a diverse dataset including Estonian National Corpus, Python-Edu, FineMath4-Plus, General Instruction-Augmented Corpora, and Cosmopedia v2.
Instruction Following: Demonstrates strong instruction-following capabilities in both Estonian (IFEval-et score of 0.6141, an improvement over its predecessor) and English (IFEval-en score of 0.8173, also an improvement).
Language Competence: Achieves notable scores in Estonian language competence benchmarks, including Grammar-et (0.8310), Inflection-et (0.5777), and Word-Meanings-et (0.9619), showing significant improvements.
Knowledge & Reasoning: Performs well in Estonian knowledge and reasoning tasks like Winogrande-et (0.6440) and Trivia-et (0.4288), and competitive scores in English benchmarks such as GSM8K (0.7726).
Translation: Shows strong performance in English to Estonian translation, achieving a BLEU score of 0.2635 on wmt24pp, making it a competitive option for this specific translation direction.

Limitations

Context Length: Has a relatively short context of 4096 tokens, and performance beyond this is not guaranteed.
Multi-turn Conversations: While improved by merging, multi-turn conversation support is not fully guaranteed.
Date Cut-off: Inherits the original Llama 3.1 system prompt's hard-coded date cut-off.

When to Use This Model

This model is particularly well-suited for applications requiring robust instruction-following and strong language understanding in Estonian and English, especially where high performance on Estonian-specific language competence and knowledge tasks is critical. Its translation capabilities from English to Estonian also make it valuable for relevant localization and multilingual content generation tasks.

Overview

Model Overview

Key Capabilities & Training

Limitations

When to Use This Model

Full Model Card (README)