yanolja/KoSOLAR-10.7B-v0.2

TEXT GENERATIONConcurrency Cost:1Model Size:15BQuant:FP8Ctx Length:8kTool Calling:SupportedPublished:Jan 18, 2024License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

KoSOLAR-10.7B-v0.2 is a 10.7 billion parameter language model developed by yanolja, based on upstage/SOLAR-10.7B-v1.0. This model is specifically enhanced for Korean language proficiency through vocabulary expansion and targeted fine-tuning on extensive Korean web-crawled datasets. It excels at Korean language tasks by preserving original base model parameters while optimizing for new Korean tokens. This model is ideal for applications requiring strong Korean language understanding and generation capabilities.

Loading preview...

KoSOLAR-10.7B-v0.2: Korean Language Enhancement

KoSOLAR-10.7B-v0.2 is a 10.7 billion parameter model developed by yanolja, building upon the upstage/SOLAR-10.7B-v1.0 architecture. Its primary distinction lies in its significant enhancement for the Korean language.

Key Capabilities & Technical Approach

  • Korean Vocabulary Expansion: The model's understanding of Korean was expanded by pre-training embeddings for 8,960 new Korean tokens, meticulously selected based on frequency in a 100GB Korean web corpus.
  • Selective Fine-tuning: It employs a unique partial fine-tuning strategy, preserving most of the base model's original parameters while focusing on lm_head embeddings for existing tokens and new token embeddings. This approach maintains original language capabilities while boosting Korean proficiency.
  • Data Sources: Training involved a diverse corpus, with 83.46% Korean web content, 10.69% multi-lingual (primarily English), and 5.86% English-to-Korean paragraph pairs.

Usage Considerations

  • Korean Language Tasks: This model excels in tasks requiring strong Korean language understanding and generation.
  • No Instruction-tuning: It has not been instruction-tuned, meaning it may require further fine-tuning for specific instruction-following applications.
  • Preserved Original Capabilities: The selective freezing of parameters ensures that the model's original language capabilities are not compromised while enhancing Korean.