mmnga/Llama-3-70B-japanese-suzume-vector-v0.1

Warm
Public
70B
FP8
8192
4
Apr 28, 2024
License: llama3
Hugging Face
Overview

mmnga/Llama-3-70B-japanese-suzume-vector-v0.1: Experimental Japanese Integration

This model, developed by mmnga, is an experimental 70 billion parameter Llama-3-based model focused on integrating Japanese language capabilities into the meta-llama/Meta-Llama-3-70B-Instruct architecture. It utilizes a novel chat-vector approach to transfer linguistic nuances.

Key Capabilities & Methodology

  • Japanese Language Adaptation: Aims to enhance the Japanese understanding and generation capabilities of the Llama-3-70B-Instruct model.
  • Chat-Vector Approach: The core methodology involves:
    • Calculating the difference (vector) between meta-llama/Meta-Llama-3-8B-Instruct and lightblue/suzume-llama-3-8B-japanese.
    • Upsampling this difference vector to match the meta-llama/Meta-Llama-3-70B-Instruct's parameter shape.
    • Applying this upsampled difference to the 70B model, specifically targeting the middle layers while keeping the first and last 8 layers intact.

Current Status & Limitations

  • Experimental Nature: This is an ongoing experiment, and initial results indicate that the applied differences had a minimal impact on the 70B model's performance. The developer plans to explore scaling factors for future iterations.
  • No Specific Benchmarks: The current README does not provide specific performance benchmarks or evaluation metrics, as it is an exploratory project.

Good For

  • Researchers and developers interested in transfer learning techniques for large language models, particularly for language adaptation.
  • Experimentation with vector-based fine-tuning and parameter injection methods.
  • Exploring the challenges and potential of integrating specific language capabilities into pre-trained large models like Llama-3.