masterjae/T-Llama-3-8B
T-Llama-3-8B is an 8 billion parameter Korean continual pre-trained language model developed by TmaxAI, based on Meta-Llama-3-8B. It was trained on 18.8 billion tokens from a 64GB Korean corpus, featuring an 8192-token context length. Optimized with DeepSpeed, Gradient Checkpointing, Flash_Attention_2, and BF16 mixed precision, this model significantly outperforms the original Llama-3-8B and achieves comparable performance to leading Korean continuous pretraining models on multilingual translation tasks.
Loading preview...
T-Llama-3-8B: Korean Continual Pre-trained Model
T-Llama-3-8B is an 8 billion parameter language model developed by TmaxAI, continually pre-trained on the Meta-Llama-3-8B architecture. It was trained on a substantial 18.8 billion tokens from a 64GB Korean corpus, enhancing its proficiency in the Korean language. The model supports an 8192-token sequence length, making it suitable for processing longer texts.
Key Capabilities
- Enhanced Korean Language Understanding: Specialized training on a large Korean corpus significantly improves its performance for Korean-centric tasks.
- Multilingual Translation: Demonstrates strong capabilities in multilingual translation (MMT), outperforming the base Llama-3-8B and achieving competitive results against other leading Korean continuous pretraining models, particularly in EN<->KO, KO<->EN, JA<->KO, KO<->JA, ZH<->KO, and KO<->ZH directions.
- Optimized Performance: Integrates advanced optimization techniques including DeepSpeed, Gradient Checkpointing, Flash_Attention_2, and BF16 Mixed Precision for efficient training and inference.
Good For
- Applications requiring robust Korean language processing.
- Multilingual translation tasks involving Korean.
- Researchers and developers looking for a high-performance, continually pre-trained Korean LLM.