tokyotech-llm/Llama-3.1-Swallow-8B-Instruct-v0.2

Warm
Public
8B
FP8
32768
1
Oct 30, 2024
License: llama3.1
Hugging Face
Overview

Llama-3.1-Swallow-8B-Instruct-v0.2 Overview

This model is an 8 billion parameter instruction-tuned variant from the Llama 3.1 Swallow series, developed by tokyotech-llm. It is built by continually pre-training the original Meta Llama 3.1 models, specifically enhancing Japanese language proficiency while preserving English capabilities. The pre-training involved approximately 200 billion tokens from a large Japanese web corpus (Swallow Corpus Version 2), Japanese and English Wikipedia, and mathematical/coding content.

Key Capabilities & Features

  • Enhanced Japanese Performance: Achieves a Japanese average score of 0.5141 across various benchmarks, outperforming other Llama 3 and Qwen2 models in its size class.
  • Strong English Performance: Maintains competitive English task performance with an average score of 0.5823.
  • Instruction-Tuned: Fine-tuned using synthetic Japanese and English datasets, including Llama-3.1-LMSYS-Chat-1M-Synth-Ja, Swallow-Magpie-Ultra-v0.1, and filtered-magpie-ultra-en.
  • Multi-turn Dialogue: Demonstrates solid performance on the MT-Bench JA, scoring 0.5584.

Good For

  • Applications requiring robust performance in both Japanese and English.
  • Tasks involving question answering, summarization, and code generation in Japanese contexts.
  • Multi-turn conversational AI systems targeting Japanese users.