masterjae/T-Llama-3-8B

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:8kPublished:Oct 2, 2024Architecture:Transformer Cold

T-Llama-3-8B is an 8 billion parameter Korean continual pre-trained language model developed by TmaxAI, based on Meta-Llama-3-8B. It was trained on 18.8 billion tokens from a 64GB Korean corpus, featuring an 8192-token context length. Optimized with DeepSpeed, Gradient Checkpointing, Flash_Attention_2, and BF16 mixed precision, this model significantly outperforms the original Llama-3-8B and achieves comparable performance to leading Korean continuous pretraining models on multilingual translation tasks.

Loading preview...

T-Llama-3-8B: Korean Continual Pre-trained Model

T-Llama-3-8B is an 8 billion parameter language model developed by TmaxAI, continually pre-trained on the Meta-Llama-3-8B architecture. It was trained on a substantial 18.8 billion tokens from a 64GB Korean corpus, enhancing its proficiency in the Korean language. The model supports an 8192-token sequence length, making it suitable for processing longer texts.

Key Capabilities

  • Enhanced Korean Language Understanding: Specialized training on a large Korean corpus significantly improves its performance for Korean-centric tasks.
  • Multilingual Translation: Demonstrates strong capabilities in multilingual translation (MMT), outperforming the base Llama-3-8B and achieving competitive results against other leading Korean continuous pretraining models, particularly in EN<->KO, KO<->EN, JA<->KO, KO<->JA, ZH<->KO, and KO<->ZH directions.
  • Optimized Performance: Integrates advanced optimization techniques including DeepSpeed, Gradient Checkpointing, Flash_Attention_2, and BF16 Mixed Precision for efficient training and inference.

Good For

  • Applications requiring robust Korean language processing.
  • Multilingual translation tasks involving Korean.
  • Researchers and developers looking for a high-performance, continually pre-trained Korean LLM.