Swallow-7b-NVE-instruct-hf: Japanese-Enhanced Llama 2 Instruction Model
The tokyotech-llm/Swallow-7b-NVE-instruct-hf is a 7 billion parameter instruction-tuned model developed by TokyoTech-LLM. It is built upon the Llama 2 architecture and has undergone continual pre-training with significant Japanese language data, making it highly proficient in Japanese. This particular model is an "NVE" (No Vocabulary Expansion) variant, meaning it utilizes a standard tokenizer rather than one with an expanded Japanese vocabulary, which can impact tokenization efficiency compared to other Swallow models.
Key Capabilities and Features
- Japanese Language Proficiency: Excels in various Japanese tasks, outperforming the base Llama 2 7B model across benchmarks like JCommonsenseQA, JEMHopQA, NIILC, and JSQuAD.
- Instruction Following: Fine-tuned using supervised fine-tuning (SFT) on datasets including Japanese translations of Anthropic HH-RLHF, Databricks Dolly 15k, and OpenAssistant Conversations Dataset.
- Bilingual Support: While optimized for Japanese, it also handles English, though its performance on English benchmarks is generally lower than the original Llama 2.
- Llama 2 Foundation: Inherits the robust architecture of the Llama 2 family.
Training and Datasets
The model was continually pre-trained on a diverse corpus including Japanese Wikipedia, RefinedWeb, The Pile, and the proprietary Swallow Corpus. Instruction tuning leveraged Japanese versions of popular instruction datasets.
Usage Considerations
This model is suitable for applications requiring strong instruction-following capabilities in Japanese. Developers should note its NVE characteristic when comparing with other Swallow models that feature vocabulary expansion for potentially faster inference. The model is still in early research stages and has not been extensively tuned for safety or human alignment.