Overview
Llama-3-KoEn-8B: A Bilingual Llama 3 Adaptation
Llama-3-KoEn-8B is an 8 billion parameter language model developed by Junbum Lee (Beomi), built upon the Llama-3-8B architecture. This model undergoes continued pretraining using a combined Korean and English corpus, distinguishing it from the original Llama 3 which is primarily intended for English use. The training was conducted on TPUv4-256 hardware.
Key Capabilities & Features
- Bilingual Proficiency: Specifically trained on Korean and English data, making it adept at handling both languages.
- Llama 3 Architecture: Benefits from the optimized transformer architecture of Llama 3, including Grouped Query Attention (GQA).
- Context Length: Supports an 8k token context length, allowing for processing longer inputs.
- Pretrained Model: This is a pretrained model, suitable for adaptation to various natural language generation tasks.
- Instruction-tuned Preview: A related instruction-tuned preview model, Llama-3-KoEn-8B-Instruct-preview, is available, offering a starting point for chat/instruct applications, though it is not yet fine-tuned with Korean instruction sets.
Intended Use Cases
This model is designed for commercial and research applications requiring text generation in both Korean and English. While the base Llama 3 is primarily English-focused, Llama-3-KoEn-8B's specialized training makes it a strong candidate for bilingual tasks. Developers can fine-tune this model for specific applications, adhering to the CC-By-NC-SA-4.0 and Llama 3 licenses.