Swallow-70b-instruct-hf: Japanese-Enhanced Llama 2 Model
Swallow-70b-instruct-hf is a 70 billion parameter instruction-tuned language model developed by TokyoTech-LLM. It is based on the Llama 2 architecture and has been continually pre-trained with extensive Japanese language data, significantly enhancing its proficiency in Japanese. A key feature is its tokenizer, which incorporates a broadened vocabulary specifically for Japanese, leading to more efficient text representation and faster inference.
Key Capabilities and Performance
- Superior Japanese Language Performance: The model demonstrates strong performance across a range of Japanese benchmarks, including JCommonsenseQA, JEMHopQA, NIILC, JSQuAD, XL-Sum, WMT20, and MGSM. For instance, the 70B Swallow model achieves 0.9348 on JCommonsenseQA and 0.4840 on MGSM, outperforming its Llama 2 counterpart in most Japanese tasks.
- Efficient Tokenization: Utilizes a tokenizer with an expanded Japanese vocabulary, allowing for more compact text representation and improved inference speed.
- Instruction-Tuned: This specific version is instruction-tuned, making it suitable for conversational and instruction-following tasks.
- Competitive English Performance: While optimized for Japanese, the model maintains solid performance on English benchmarks such as OpenBookQA, TriviaQA, HellaSwag, SQuAD2.0, XWINO, and GSM8K.
Training Details
The model underwent continual pre-training using diverse datasets including Japanese Wikipedia, RefinedWeb, Swallow Corpus, and The Pile. Instruction tuning was performed with datasets like Anthropic HH-RLHF, Databricks Dolly 15-k, and OpenAssistant Conversations Dataset, all adapted for Japanese.
Good for
- Applications requiring high-quality Japanese language understanding and generation.
- Instruction-following tasks in Japanese.
- Scenarios where efficient processing of Japanese text is crucial due to optimized tokenization.
- Research and development in multilingual LLMs, particularly for Japanese-English contexts.