KoSaul-8B: Specialized Korean LLM
KoSaul-8B is an 8 billion parameter language model developed by Ingeol Baek, built upon the Open-ko-llama3-8B architecture. This model underwent continuous pre-training using a specialized dataset to enhance its performance in specific domains.
Key Capabilities & Training
- Domain-Specific Training: The model was trained on a unique combination of Korean datasets, including:
- National Law Information Center Open API crawled data.
- AI-hub legal knowledge base.
- AI-hub medical and legal professional book corpora.
- Optimized for Korean Legal & Medical Contexts: The specialized training data makes KoSaul-8B particularly adept at understanding and generating content related to Korean law and medicine.
- Improved Perplexity: Benchmarking on legal data shows KoSaul-8B achieving a perplexity of 2.649, outperforming several other Korean LLMs in its class, such as Open-Llama3-8B (3.529) and KULLM3 (2.903).
- Technical Specifications: Trained with a batch size of 96, a context length of 1024 (for training, though the model supports 8192 tokens), and using an AdamW optimizer.
Good For
- Applications requiring deep understanding and generation of Korean legal texts.
- Use cases in the Korean medical domain that benefit from specialized language models.
- Developers looking for a Llama 3-based model with enhanced performance on specific Korean professional knowledge bases.