beomi/Llama-3-KoEn-8B

Warm
Public
8B
FP8
8192
License: llama3
Hugging Face
Overview

Llama-3-KoEn-8B: A Bilingual Llama 3 Adaptation

Llama-3-KoEn-8B is an 8 billion parameter language model developed by Junbum Lee (Beomi), built upon the Llama-3-8B architecture. This model undergoes continued pretraining using a combined Korean and English corpus, distinguishing it from the original Llama 3 which is primarily intended for English use. The training was conducted on TPUv4-256 hardware.

Key Capabilities & Features

  • Bilingual Proficiency: Specifically trained on Korean and English data, making it adept at handling both languages.
  • Llama 3 Architecture: Benefits from the optimized transformer architecture of Llama 3, including Grouped Query Attention (GQA).
  • Context Length: Supports an 8k token context length, allowing for processing longer inputs.
  • Pretrained Model: This is a pretrained model, suitable for adaptation to various natural language generation tasks.
  • Instruction-tuned Preview: A related instruction-tuned preview model, Llama-3-KoEn-8B-Instruct-preview, is available, offering a starting point for chat/instruct applications, though it is not yet fine-tuned with Korean instruction sets.

Intended Use Cases

This model is designed for commercial and research applications requiring text generation in both Korean and English. While the base Llama 3 is primarily English-focused, Llama-3-KoEn-8B's specialized training makes it a strong candidate for bilingual tasks. Developers can fine-tune this model for specific applications, adhering to the CC-By-NC-SA-4.0 and Llama 3 licenses.