Overview
Swallow-13b-NVE-hf is a 13 billion parameter language model developed by TokyoTech-LLM, built upon the Llama 2 architecture. It has undergone continual pre-training with a significant addition of Japanese language data, aiming to enhance its performance in Japanese-centric tasks. This specific variant, "NVE" (No Vocabulary Expansion), indicates a version that does not utilize an expanded vocabulary based on Japanese data, differentiating it from other Swallow models that do.
Key Capabilities
- Strong Japanese Language Performance: Benchmarks show significant improvements over Llama 2 13B on various Japanese tasks, including JCommonsenseQA, JEMHopQA, NIILC, JSQuAD, and MGSM.
- Retained English Capabilities: While optimized for Japanese, the model generally maintains competitive performance on English benchmarks compared to its Llama 2 base, though some scores may be slightly lower.
- Llama 2 Foundation: Benefits from the robust architecture and pre-training of the Llama 2 family.
Should you use this for your use case?
- Japanese Language Applications: Ideal for tasks requiring high proficiency in Japanese text understanding, generation, and reasoning.
- Research and Development: Suitable for researchers exploring cross-lingual adaptation and the impact of continual pre-training on specific languages.
- Base Model for Fine-tuning: Can serve as a strong base for further fine-tuning on specific Japanese-language downstream tasks, especially if vocabulary expansion is not desired or causes compatibility issues with existing pipelines.