LumiOpen/Llama-Poro-2-8B-SFT
LumiOpen's Llama-Poro-2-8B-SFT is an 8 billion parameter supervised fine-tuned (SFT) model based on Llama 3.1 8B, designed for instruction following and conversational AI in both Finnish and English. Developed by a collaboration including AMD Silo AI and TurkuNLP, it serves as an intermediate checkpoint in the Poro 2 model family, preceding Direct Preference Optimization (DPO). This model demonstrates significant improvements in Finnish instruction-following capabilities compared to Llama 3.1 8B Instruct, while maintaining strong English performance, making it ideal for research into post-training techniques.
Loading preview...
Poro 2 8B SFT: A Research-Focused Instruction-Following Model
LumiOpen's Poro 2 8B SFT is an 8 billion parameter supervised fine-tuned (SFT) model built upon the Llama 3.1 8B architecture. It is an intermediate checkpoint in the Poro 2 model family, specifically designed for instruction following and conversational AI in both Finnish and English. This model has not undergone preference tuning (DPO), making it a valuable resource for researchers studying the impact of different post-training methodologies.
Key Capabilities & Features
- Bilingual Proficiency: Supports instruction following and conversation in both English and Finnish.
- Supervised Fine-Tuning: Trained on 1.4 million instruction-following examples, including Tulu 3 prompts, multi-turn conversations, and translation samples.
- Improved Finnish Performance: Shows substantial improvements in Finnish instruction-following benchmarks (e.g., IFEval Finnish, MTBench Finnish, AlpacaEval 2 Finnish) compared to Llama 3.1 8B Instruct.
- Maintained English Performance: Retains strong performance in English instruction-following tasks.
- Llama 3.1 Base: Benefits from the robust foundation of the Llama 3.1 8B model.
- 8192 Token Context: Features a maximum sequence length of 8192 tokens.
Ideal Use Cases
This model is primarily intended for:
- Research: Studying the effects of supervised fine-tuning versus preference tuning, and comparative analysis of post-training techniques.
- Ablation Studies: Investigating the contribution of different training phases to instruction-following capabilities.
- Educational Applications: Learning about the development process of instruction-following models.
- Development: Serving as a starting point for further preference tuning experiments. For production use, the DPO-tuned Poro 2 8B Instruct is recommended.