tokyotech-llm/Swallow-13b-NVE-hf
The Swallow-13b-NVE-hf model by tokyotech-llm is a 13 billion parameter language model continually pre-trained from the Llama 2 family, specifically enhanced with Japanese language data. This NVE (No Vocabulary Expansion) version focuses on strong performance in Japanese tasks while maintaining English capabilities. It is designed for efficient processing of Japanese text, making it suitable for applications requiring robust Japanese language understanding and generation.
Loading preview...
Overview
Swallow-13b-NVE-hf is a 13 billion parameter language model developed by TokyoTech-LLM, built upon the Llama 2 architecture. It has undergone continual pre-training with a significant addition of Japanese language data, aiming to enhance its performance in Japanese-centric tasks. This specific variant, "NVE" (No Vocabulary Expansion), indicates a version that does not utilize an expanded vocabulary based on Japanese data, differentiating it from other Swallow models that do.
Key Capabilities
- Strong Japanese Language Performance: Benchmarks show significant improvements over Llama 2 13B on various Japanese tasks, including JCommonsenseQA, JEMHopQA, NIILC, JSQuAD, and MGSM.
- Retained English Capabilities: While optimized for Japanese, the model generally maintains competitive performance on English benchmarks compared to its Llama 2 base, though some scores may be slightly lower.
- Llama 2 Foundation: Benefits from the robust architecture and pre-training of the Llama 2 family.
Should you use this for your use case?
- Japanese Language Applications: Ideal for tasks requiring high proficiency in Japanese text understanding, generation, and reasoning.
- Research and Development: Suitable for researchers exploring cross-lingual adaptation and the impact of continual pre-training on specific languages.
- Base Model for Fine-tuning: Can serve as a strong base for further fine-tuning on specific Japanese-language downstream tasks, especially if vocabulary expansion is not desired or causes compatibility issues with existing pipelines.