LumiOpen/Llama-Poro-2-8B-base

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:8kPublished:May 27, 2025License:llama3.1Architecture:Transformer0.0K Warm

LumiOpen/Llama-Poro-2-8B-base is an 8 billion parameter decoder-only transformer model developed by AMD Silo AI, TurkuNLP, and HPLT. It is a continued pretraining of Llama 3.1 8B, specifically designed to add Finnish language capabilities while maintaining strong English, code, and math proficiency. Trained on 165 billion tokens with an 8192 token context length, this base model significantly improves Finnish benchmark performance over its Llama 3.1 counterpart.

Loading preview...

Poro 2 8B Base: Finnish-Enhanced Llama 3.1

Poro 2 8B Base is an 8 billion parameter decoder-only transformer model, developed through a collaboration between AMD Silo AI, the University of Turku's TurkuNLP group, and High Performance Language Technologies (HPLT). It is a continued pretraining of the Llama 3.1 8B model, specifically engineered to integrate robust Finnish language capabilities while preserving its original strengths in English, code, and mathematics.

Key Capabilities & Training

  • Multilingual Proficiency: Achieves substantial improvements in Finnish benchmarks (e.g., +10% average on ARC Challenge, HellaSwag, MMLU, TruthfulQA) compared to Llama 3.1 8B, with only a minor decrease in English performance.
  • Translation: Demonstrates strong translation capabilities, with significant BLEU score improvements for both EN→FI and FI→EN translation over Llama 3.1 8B.
  • Architecture: Based on Llama 3.1 8B, featuring 32 layers, 32 heads, and an 8192 token maximum sequence length.
  • Training Data: Pretrained on 165 billion tokens, comprising a balanced mix of Finnish (30%), English (30%), Code (30%), and Math (10%) datasets.
  • Open Source: Released under the Llama 3.1 Community License, promoting transparency and community use.

Usage Considerations

This is a base model and requires further fine-tuning (e.g., SFT or DPO) for most practical use cases. The Poro 2 family also includes SFT and Instruct versions, as well as larger 70B parameter models. As with all large language models, users should be aware of potential biases or inaccuracies stemming from its training data.