Name: mmnga/Llama-3-70B-japanese-suzume-vector-v0.1 API
Brand: Featherless.ai
Price: 25.00 USD
Availability: InStock
Author: mmnga

mmnga/Llama-3-70B-japanese-suzume-vector-v0.1: Experimental Japanese Integration

This model, developed by mmnga, is an experimental 70 billion parameter Llama-3-based model focused on integrating Japanese language capabilities into the meta-llama/Meta-Llama-3-70B-Instruct architecture. It utilizes a novel chat-vector approach to transfer linguistic nuances.

Key Capabilities & Methodology

Japanese Language Adaptation: Aims to enhance the Japanese understanding and generation capabilities of the Llama-3-70B-Instruct model.
Chat-Vector Approach: The core methodology involves:
- Calculating the difference (vector) between meta-llama/Meta-Llama-3-8B-Instruct and lightblue/suzume-llama-3-8B-japanese.
- Upsampling this difference vector to match the meta-llama/Meta-Llama-3-70B-Instruct's parameter shape.
- Applying this upsampled difference to the 70B model, specifically targeting the middle layers while keeping the first and last 8 layers intact.

Current Status & Limitations

Experimental Nature: This is an ongoing experiment, and initial results indicate that the applied differences had a minimal impact on the 70B model's performance. The developer plans to explore scaling factors for future iterations.
No Specific Benchmarks: The current README does not provide specific performance benchmarks or evaluation metrics, as it is an exploratory project.

Good For

Researchers and developers interested in transfer learning techniques for large language models, particularly for language adaptation.
Experimentation with vector-based fine-tuning and parameter injection methods.
Exploring the challenges and potential of integrating specific language capabilities into pre-trained large models like Llama-3.