Overview
Overview
YanoljaNEXT-EEVE-2.8B is a 2.8 billion parameter language model developed by Yanolja, built upon the microsoft/phi-2 architecture. Its primary innovation lies in its Korean vocabulary expansion, achieved through a seven-stage training process that progressively integrates new Korean linguistic tokens while preserving the original base model parameters. This method involves pre-training embeddings for new tokens and partially fine-tuning lm_head embeddings for existing ones.
Key Capabilities
- Korean Language Proficiency: Significantly enhanced understanding and generation of Korean text due to a meticulously curated vocabulary of 8,960 new Korean tokens.
- Efficient Adaptation: Leverages the inherent capabilities of foundational English models to efficiently transfer knowledge and reasoning to Korean.
- Parameter-Efficient Training: Utilizes a parameter freezing approach during its seven-stage training process, optimizing the adaptation without retraining the entire model.
Good For
- Korean NLP Applications: Ideal for tasks requiring strong Korean language comprehension and generation.
- Research on Multilingual LLMs: Demonstrates an effective method for vocabulary expansion and cross-linguistic applicability.
Limitations
- This model has not been instruction-tuned, meaning it may require further fine-tuning for specific instruction-based applications.
For more technical details, refer to the associated paper: Efficient and Effective Vocabulary Expansion Towards Multilingual Large Language Models.