tokyotech-llm/Swallow-7b-NVE-hf
The Swallow-7b-NVE-hf model by TokyoTech-LLM is a 7 billion parameter language model continually pre-trained from the Llama 2 family, specifically enhanced with Japanese language data. This version, 'NVE' (No Vocabulary Expansion), focuses on improving Japanese language capabilities without altering the original Llama 2 tokenizer's vocabulary. It demonstrates strong performance on various Japanese NLP tasks, often outperforming its Llama 2 base, while maintaining competitive English task performance.
Loading preview...
Overview
TokyoTech-LLM's Swallow-7b-NVE-hf is a 7 billion parameter language model built upon the Llama 2 architecture. It has undergone continual pre-training with a significant addition of Japanese language data to enhance its proficiency in Japanese. Unlike some other Swallow variants, this 'NVE' (No Vocabulary Expansion) model retains the original Llama 2 tokenizer, focusing on performance improvements through data rather than vocabulary modifications.
Key Capabilities
- Enhanced Japanese Performance: Demonstrates notable improvements over the base Llama 2 model across various Japanese benchmarks, including JCommonsenseQA, JEMHopQA, NIILC, and JSQuAD.
- Bilingual Proficiency: While optimized for Japanese, it maintains competitive performance on English NLP tasks such as OpenBookQA, TriviaQA, and HellaSwag.
- Continual Pre-training: Benefits from additional training on diverse datasets including Japanese Wikipedia, RefinedWeb, Swallow Corpus, and The Pile.
Good for
- Applications requiring strong Japanese language understanding and generation.
- Researchers and developers looking for a Llama 2-based model with specialized Japanese capabilities.
- Use cases where maintaining the original Llama 2 tokenizer is preferred.