LumiOpen/Llama-Poro-2-8B-SFT

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:8kPublished:Jun 13, 2025License:llama3.3Architecture:Transformer0.0K Cold

LumiOpen's Llama-Poro-2-8B-SFT is an 8 billion parameter supervised fine-tuned (SFT) model based on Llama 3.1 8B, designed for instruction following and conversational AI in both Finnish and English. Developed by a collaboration including AMD Silo AI and TurkuNLP, it serves as an intermediate checkpoint in the Poro 2 model family, preceding Direct Preference Optimization (DPO). This model demonstrates significant improvements in Finnish instruction-following capabilities compared to Llama 3.1 8B Instruct, while maintaining strong English performance, making it ideal for research into post-training techniques.

Loading preview...

Poro 2 8B SFT: A Research-Focused Instruction-Following Model

LumiOpen's Poro 2 8B SFT is an 8 billion parameter supervised fine-tuned (SFT) model built upon the Llama 3.1 8B architecture. It is an intermediate checkpoint in the Poro 2 model family, specifically designed for instruction following and conversational AI in both Finnish and English. This model has not undergone preference tuning (DPO), making it a valuable resource for researchers studying the impact of different post-training methodologies.

Key Capabilities & Features

  • Bilingual Proficiency: Supports instruction following and conversation in both English and Finnish.
  • Supervised Fine-Tuning: Trained on 1.4 million instruction-following examples, including Tulu 3 prompts, multi-turn conversations, and translation samples.
  • Improved Finnish Performance: Shows substantial improvements in Finnish instruction-following benchmarks (e.g., IFEval Finnish, MTBench Finnish, AlpacaEval 2 Finnish) compared to Llama 3.1 8B Instruct.
  • Maintained English Performance: Retains strong performance in English instruction-following tasks.
  • Llama 3.1 Base: Benefits from the robust foundation of the Llama 3.1 8B model.
  • 8192 Token Context: Features a maximum sequence length of 8192 tokens.

Ideal Use Cases

This model is primarily intended for:

  • Research: Studying the effects of supervised fine-tuning versus preference tuning, and comparative analysis of post-training techniques.
  • Ablation Studies: Investigating the contribution of different training phases to instruction-following capabilities.
  • Educational Applications: Learning about the development process of instruction-following models.
  • Development: Serving as a starting point for further preference tuning experiments. For production use, the DPO-tuned Poro 2 8B Instruct is recommended.