Name: tartuNLP/Llama-3.1-EstLLM-8B-0525 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: tartuNLP

Model Overview

The tartuNLP/Llama-3.1-EstLLM-8B-0525 is an 8 billion parameter base text completion model developed by TartuNLP and TalTechNLP, funded by the Estonian Ministry of Education and Research. It is a continued pre-training of the original meta-llama/Llama-3.1-8B model, with an additional 35 billion tokens of training data. This dataset includes the Estonian National Corpus (8.6B tokens), Python-Edu (3.3B tokens), FineMath4-Plus (9.5B tokens), General Instruction-Augmented Corpora (7.4B tokens), and Cosmopedia v2 (6.9B tokens).

Key Capabilities

Enhanced Estonian Language Proficiency: Significantly improved performance on various Estonian benchmarks, including belebele-et, exam-et, grammar-et, inflection-et, trivia-et, winogrande-et, xcopa-et, and GlobalPIQA-et, often surpassing its Llama 3.1 base and other comparable models.
Multilingual Support: While primarily focused on Estonian, it also maintains strong English language capabilities, as evidenced by its performance on belebele-en, MMLU-Redux, and winogrande benchmarks.
Translation Performance: Achieves competitive BLEU scores for Estonian to English and English to Estonian translation tasks.

Good For

Fine-tuning for Estonian NLP tasks: This model is explicitly designed as a base model for further fine-tuning on specific downstream tasks requiring strong Estonian language understanding and generation.
Research and Development: Ideal for researchers exploring continued pre-training techniques and developing specialized LLMs for less-resourced languages like Estonian.

Limitations

Base Model: It is a base text completion model and not instruction-tuned, meaning it is not suitable for direct chat or instruction-following without further fine-tuning.
Context Size: The continued training was performed with a sequence length of 4096 tokens, which may result in a somewhat limited effective context size compared to models trained with longer contexts.

Overview

Model Overview

Key Capabilities

Good For

Limitations

Full Model Card (README)