Overview
Llama 3.1 Swallow 8B Instruct v0.1: Enhanced Japanese Capabilities
This model is an 8 billion parameter instruction-tuned variant from the Llama 3.1 Swallow series, developed by tokyotech-llm. It is built by continually pre-training on the Meta Llama 3.1 base models, specifically focusing on significantly enhancing Japanese language capabilities while maintaining strong English performance.
Key Capabilities
- Bilingual Proficiency: Excels in both Japanese and English, with a particular focus on Japanese language tasks.
- Continual Pre-training: Utilizes approximately 200 billion tokens from a large Japanese web corpus (Swallow Corpus Version 2), Japanese and English Wikipedia, and mathematical/coding content.
- Instruction-Tuned: Supervised fine-tuning (SFT) was performed using synthetic data specifically designed for Japanese.
- Strong Japanese Benchmarks: Achieves leading scores in various Japanese evaluation benchmarks, including JCommonsenseQA, JEMHopQA, NIILC, and JSQuAD, and shows competitive performance in MT-Bench JA.
- Llama 3.1 Foundation: Benefits from the robust architecture and tokenizer of the Meta Llama 3.1 models.
Good for
- Applications requiring high-quality Japanese language understanding and generation.
- Bilingual (Japanese-English) conversational AI and instruction-following tasks.
- Research and development in cross-lingual LLM adaptation, particularly for Japanese.