tokyotech-llm/Swallow-7b-NVE-hf

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Nov 30, 2023License:llama2Architecture:Transformer0.0K Open Weights Cold

The Swallow-7b-NVE-hf model by TokyoTech-LLM is a 7 billion parameter language model continually pre-trained from the Llama 2 family, specifically enhanced with Japanese language data. This version, 'NVE' (No Vocabulary Expansion), focuses on improving Japanese language capabilities without altering the original Llama 2 tokenizer's vocabulary. It demonstrates strong performance on various Japanese NLP tasks, often outperforming its Llama 2 base, while maintaining competitive English task performance.

Loading preview...

Overview

TokyoTech-LLM's Swallow-7b-NVE-hf is a 7 billion parameter language model built upon the Llama 2 architecture. It has undergone continual pre-training with a significant addition of Japanese language data to enhance its proficiency in Japanese. Unlike some other Swallow variants, this 'NVE' (No Vocabulary Expansion) model retains the original Llama 2 tokenizer, focusing on performance improvements through data rather than vocabulary modifications.

Key Capabilities

  • Enhanced Japanese Performance: Demonstrates notable improvements over the base Llama 2 model across various Japanese benchmarks, including JCommonsenseQA, JEMHopQA, NIILC, and JSQuAD.
  • Bilingual Proficiency: While optimized for Japanese, it maintains competitive performance on English NLP tasks such as OpenBookQA, TriviaQA, and HellaSwag.
  • Continual Pre-training: Benefits from additional training on diverse datasets including Japanese Wikipedia, RefinedWeb, Swallow Corpus, and The Pile.

Good for

  • Applications requiring strong Japanese language understanding and generation.
  • Researchers and developers looking for a Llama 2-based model with specialized Japanese capabilities.
  • Use cases where maintaining the original Llama 2 tokenizer is preferred.