yanolja/YanoljaNEXT-EEVE-2.8B

Warm
Public
3B
BF16
2048
Feb 22, 2024
License: apache-2.0
Hugging Face
Overview

Overview

YanoljaNEXT-EEVE-2.8B is a 2.8 billion parameter language model developed by Yanolja, built upon the microsoft/phi-2 architecture. Its primary innovation lies in its Korean vocabulary expansion, achieved through a seven-stage training process that progressively integrates new Korean linguistic tokens while preserving the original base model parameters. This method involves pre-training embeddings for new tokens and partially fine-tuning lm_head embeddings for existing ones.

Key Capabilities

  • Korean Language Proficiency: Significantly enhanced understanding and generation of Korean text due to a meticulously curated vocabulary of 8,960 new Korean tokens.
  • Efficient Adaptation: Leverages the inherent capabilities of foundational English models to efficiently transfer knowledge and reasoning to Korean.
  • Parameter-Efficient Training: Utilizes a parameter freezing approach during its seven-stage training process, optimizing the adaptation without retraining the entire model.

Good For

  • Korean NLP Applications: Ideal for tasks requiring strong Korean language comprehension and generation.
  • Research on Multilingual LLMs: Demonstrates an effective method for vocabulary expansion and cross-linguistic applicability.

Limitations

  • This model has not been instruction-tuned, meaning it may require further fine-tuning for specific instruction-based applications.

For more technical details, refer to the associated paper: Efficient and Effective Vocabulary Expansion Towards Multilingual Large Language Models.