Poro 2 8B Base: Finnish-Enhanced Llama 3.1
Poro 2 8B Base is an 8 billion parameter decoder-only transformer model, developed through a collaboration between AMD Silo AI, the University of Turku's TurkuNLP group, and High Performance Language Technologies (HPLT). It is a continued pretraining of the Llama 3.1 8B model, specifically engineered to integrate robust Finnish language capabilities while preserving its original strengths in English, code, and mathematics.
Key Capabilities & Training
- Multilingual Proficiency: Achieves substantial improvements in Finnish benchmarks (e.g., +10% average on ARC Challenge, HellaSwag, MMLU, TruthfulQA) compared to Llama 3.1 8B, with only a minor decrease in English performance.
- Translation: Demonstrates strong translation capabilities, with significant BLEU score improvements for both EN→FI and FI→EN translation over Llama 3.1 8B.
- Architecture: Based on Llama 3.1 8B, featuring 32 layers, 32 heads, and an 8192 token maximum sequence length.
- Training Data: Pretrained on 165 billion tokens, comprising a balanced mix of Finnish (30%), English (30%), Code (30%), and Math (10%) datasets.
- Open Source: Released under the Llama 3.1 Community License, promoting transparency and community use.
Usage Considerations
This is a base model and requires further fine-tuning (e.g., SFT or DPO) for most practical use cases. The Poro 2 family also includes SFT and Instruct versions, as well as larger 70B parameter models. As with all large language models, users should be aware of potential biases or inaccuracies stemming from its training data.