KaraKaraWitch/L3.1-70b-MeowMix

Warm
Public
70B
FP8
32768
4
Aug 23, 2024
Hugging Face

L3.1-70b-MeowMix by KaraKaraWitch is a 70 billion parameter merged language model based on the Llama 3.1 architecture. This model is a merge of several Llama 3.1-70B variants, including Tess-3, EZO-1.1-it, Chinese-Chat, and Korean-sft-dpo models, created using the ties merge method. While it functions well for English language tasks, it is explicitly noted as a failed attempt for CJK (Chinese, Japanese, Korean) languages, performing poorly in those contexts.

Overview

L3.1-70b-MeowMix: A Merged Llama 3.1-70B Model

L3.1-70b-MeowMix is a 70 billion parameter language model developed by KaraKaraWitch, created through a merge of multiple Llama 3.1-70B base models using the ties merge method via LazyMergekit.

Key Characteristics & Composition

This model integrates components from:

  • migtissera/Tess-3-Llama-3.1-70B
  • HODACHI/Llama-3.1-70B-EZO-1.1-it
  • shenzhi-wang/Llama3.1-70B-Chinese-Chat
  • Saxo/Linkbricks-Horizon-AI-Korean-llama3.1-sft-dpo-70B

The merging process aimed to combine the strengths of these diverse models, with migtissera/Tess-3-Llama-3.1-70B serving as the base model with a higher density and weight in the merge configuration.

Important Limitation: CJK Language Performance

Crucially, L3.1-70b-MeowMix is explicitly identified as a failed attempt for CJK (Chinese, Japanese, Korean) languages. While it performs adequately for English, users should not use this model for CJK-related tasks as its performance in these languages is significantly compromised. This limitation stems from the merging strategy, which did not yield the desired multilingual capabilities for CJK.

Chat Format

The model utilizes the standard Llama 3 Instruct chat format for interactions.