The tokyotech-llm/Llama-3-Swallow-8B-Instruct-v0.1 is an 8 billion parameter instruction-tuned causal language model developed by tokyotech-llm, built upon the Meta Llama 3 family. It features continual pre-training with a primary focus on Japanese language data, enhancing its performance in Japanese tasks. This model is optimized for instruction-following in both Japanese and English, making it suitable for bilingual applications requiring nuanced understanding and generation.
Model Overview
The tokyotech-llm/Llama-3-Swallow-8B-Instruct-v0.1 is an 8 billion parameter instruction-tuned model developed by tokyotech-llm. It is built on the Meta Llama 3 architecture and has undergone continual pre-training with a significant addition of Japanese language data. The instruction-tuned versions utilize supervised fine-tuning (SFT) and Chat Vector techniques.
Key Capabilities
- Enhanced Japanese Language Performance: Demonstrates strong performance across various Japanese benchmarks, including JCom., JEMHopQA, NIILC, JSQuAD, XL-Sum, MGSM, WMT20 (en-ja/ja-en), JMMLU, and JHumanEval.
- Bilingual Instruction Following: Excels in instruction-following tasks in both Japanese and English, as evidenced by its MT-Bench JA scores and English task benchmarks (OpenBookQA, TriviaQA, HellaSWAG, SQuAD2.0, XWINO, MMLU, GSM8K, BBH, HumanEval).
- Llama 3 Foundation: Benefits from the robust architecture and tokenizer of the Meta Llama 3 family.
When to Use This Model
This model is particularly well-suited for applications requiring:
- High-quality Japanese text generation and understanding.
- Bilingual (Japanese-English) conversational AI and instruction following.
- Tasks involving Japanese question answering, summarization, translation, and code generation.