mmnga/Llama-3-70B-japanese-suzume-vector-v0.1

Hugging Face
TEXT GENERATIONConcurrency Cost:4Model Size:70BQuant:FP8Ctx Length:8kPublished:Apr 28, 2024License:llama3Architecture:Transformer0.0K Warm

The mmnga/Llama-3-70B-japanese-suzume-vector-v0.1 is an experimental 70 billion parameter Llama-3-based model developed by mmnga, designed to integrate Japanese language capabilities into the Meta-Llama-3-70B-Instruct model. This model applies a chat-vector approach by extracting differences between a Japanese-tuned 8B model and the base Llama-3-8B-Instruct, then upsampling and applying these differences to the larger 70B model. Its primary purpose is to explore methods for transferring language-specific fine-tuning from smaller models to larger Llama-3 variants for enhanced Japanese language processing.

Loading preview...

mmnga/Llama-3-70B-japanese-suzume-vector-v0.1: Experimental Japanese Integration

This model, developed by mmnga, is an experimental 70 billion parameter Llama-3-based model focused on integrating Japanese language capabilities into the meta-llama/Meta-Llama-3-70B-Instruct architecture. It utilizes a novel chat-vector approach to transfer linguistic nuances.

Key Capabilities & Methodology

  • Japanese Language Adaptation: Aims to enhance the Japanese understanding and generation capabilities of the Llama-3-70B-Instruct model.
  • Chat-Vector Approach: The core methodology involves:
    • Calculating the difference (vector) between meta-llama/Meta-Llama-3-8B-Instruct and lightblue/suzume-llama-3-8B-japanese.
    • Upsampling this difference vector to match the meta-llama/Meta-Llama-3-70B-Instruct's parameter shape.
    • Applying this upsampled difference to the 70B model, specifically targeting the middle layers while keeping the first and last 8 layers intact.

Current Status & Limitations

  • Experimental Nature: This is an ongoing experiment, and initial results indicate that the applied differences had a minimal impact on the 70B model's performance. The developer plans to explore scaling factors for future iterations.
  • No Specific Benchmarks: The current README does not provide specific performance benchmarks or evaluation metrics, as it is an exploratory project.

Good For

  • Researchers and developers interested in transfer learning techniques for large language models, particularly for language adaptation.
  • Experimentation with vector-based fine-tuning and parameter injection methods.
  • Exploring the challenges and potential of integrating specific language capabilities into pre-trained large models like Llama-3.

Popular Sampler Settings

Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.

temperature
top_p
top_k
frequency_penalty
presence_penalty
repetition_penalty
min_p