tokyotech-llm/Llama-3.3-Swallow-70B-Instruct-v0.4

Loading
Public
70B
FP8
32768
License: llama3.3
Hugging Face
Overview

Llama-3.3-Swallow-70B-Instruct-v0.4 Overview

Llama-3.3-Swallow-70B-Instruct-v0.4 is a 70 billion parameter instruction-tuned model from tokyotech-llm, continually pre-trained on the Meta Llama 3.3 base model. Its primary focus is to significantly enhance Japanese language capabilities while preserving the strong English performance of the original Llama 3.3.

Key Capabilities & Training

  • Bilingual Proficiency: Excels in both Japanese and English, making it a versatile choice for multilingual applications.
  • Continual Pre-training: Utilized approximately 315 billion tokens from a large Japanese web corpus (Swallow Corpus Version 2), Japanese and English Wikipedia, and mathematical/coding content.
  • Instruction Tuning: Fine-tuned using supervised fine-tuning (SFT) on synthetic data specifically designed for Japanese, including datasets like Gemma-2-LMSYS-Chat-1M-Synth, Swallow-Magpie-Ultra-v0.1, Swallow-Gemma-Magpie-v0.1, and Swallow-Code-v0.3-Instruct-style.
  • Context Length: Supports a context length of 32768 tokens.

Performance Highlights

  • MT-Bench JA: Achieves a JMT Avg score of 0.772, demonstrating strong performance in Japanese multi-turn dialogue, outperforming Llama 3.3 70B Instruct (0.737).
  • Japanese Benchmarks: Shows competitive results across various Japanese tasks, including JCommonsenseQA (0.981 EM acc), WMT20-en-ja (0.319 BLEU), and WMT20-ja-en (0.261 BLEU).
  • English Benchmarks: Maintains solid performance on English tasks, with notable scores in GSM8K (0.908 EM acc) and HumanEval (0.750 pass@1).

Good for

  • Applications requiring high-quality Japanese language generation and understanding.
  • Bilingual (Japanese-English) conversational AI and instruction-following tasks.
  • Developers seeking a Llama 3.3-based model with enhanced Asian language support.