tokyotech-llm/Llama-3.1-Swallow-8B-Instruct-v0.1

Warm
Public
8B
FP8
32768
License: llama3.1
Hugging Face
Overview

Llama 3.1 Swallow 8B Instruct v0.1: Enhanced Japanese Capabilities

This model is an 8 billion parameter instruction-tuned variant from the Llama 3.1 Swallow series, developed by tokyotech-llm. It is built by continually pre-training on the Meta Llama 3.1 base models, specifically focusing on significantly enhancing Japanese language capabilities while maintaining strong English performance.

Key Capabilities

  • Bilingual Proficiency: Excels in both Japanese and English, with a particular focus on Japanese language tasks.
  • Continual Pre-training: Utilizes approximately 200 billion tokens from a large Japanese web corpus (Swallow Corpus Version 2), Japanese and English Wikipedia, and mathematical/coding content.
  • Instruction-Tuned: Supervised fine-tuning (SFT) was performed using synthetic data specifically designed for Japanese.
  • Strong Japanese Benchmarks: Achieves leading scores in various Japanese evaluation benchmarks, including JCommonsenseQA, JEMHopQA, NIILC, and JSQuAD, and shows competitive performance in MT-Bench JA.
  • Llama 3.1 Foundation: Benefits from the robust architecture and tokenizer of the Meta Llama 3.1 models.

Good for

  • Applications requiring high-quality Japanese language understanding and generation.
  • Bilingual (Japanese-English) conversational AI and instruction-following tasks.
  • Research and development in cross-lingual LLM adaptation, particularly for Japanese.