tokyotech-llm/Llama-3.1-Swallow-8B-Instruct-v0.3

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kTool Calling:SupportedPublished:Dec 18, 2024License:llama3.1Architecture:Transformer0.0K Warm

The Llama-3.1-Swallow-8B-Instruct-v0.3 model by tokyotech-llm is an 8 billion parameter instruction-tuned large language model, continually pre-trained from Meta's Llama 3.1. It significantly enhances Japanese language capabilities while retaining strong English performance, utilizing approximately 200 billion tokens from Japanese web corpora, Wikipedia, and technical content. This model excels in multi-turn Japanese dialogue, achieving state-of-the-art performance on Japanese MT-Bench among open-source LLMs of comparable size.

Loading preview...

Llama-3.1-Swallow-8B-Instruct-v0.3 Overview

This model is an 8 billion parameter instruction-tuned variant from the Llama 3.1 Swallow series, developed by tokyotech-llm. It is built upon Meta's Llama 3.1 base models through continual pre-training with a focus on enhancing Japanese language capabilities while maintaining strong English performance. The pre-training involved approximately 200 billion tokens from diverse sources, including a large Japanese web corpus (Swallow Corpus Version 2), Japanese and English Wikipedia, and mathematical/coding content.

Key Capabilities & Features

  • Bilingual Proficiency: Significantly improved Japanese language understanding and generation, alongside robust English capabilities.
  • Instruction-Tuned: Optimized for following user instructions and engaging in multi-turn conversations, achieved through supervised fine-tuning on specially built synthetic Japanese data.
  • State-of-the-Art Japanese MT-Bench Performance: This v0.3 release demonstrates leading performance on Japanese MT-Bench among open-source LLMs with 8 billion parameters or less, showing an 8.4-point improvement over its predecessor.
  • Llama 3.1 Architecture: Leverages the architectural strengths of the Meta Llama 3.1 series.

Ideal Use Cases

  • Japanese-centric Applications: Excellent for chatbots, content generation, and conversational AI systems requiring high proficiency in Japanese.
  • Bilingual AI Assistants: Suitable for applications that need to seamlessly handle both Japanese and English interactions.
  • Research and Development: A strong foundation for further fine-tuning or research into cross-lingual LLM adaptation, particularly for Japanese language tasks.

Popular Sampler Settings

Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.

temperature
top_p
top_k
frequency_penalty
presence_penalty
repetition_penalty
min_p