p208p2002/llama-3-zhtw-8B

Warm
Public
8B
FP8
8192
License: llama3
Hugging Face
Overview

Llama 3 zhtw: A Traditional Chinese Enhanced Llama 3 Model

p208p2002/llama-3-zhtw-8B is an 8 billion parameter model based on the Llama 3 architecture, developed by p208p2002. It underwent continued pre-training (CP) with an additional 800 million tokens, specifically targeting Traditional Chinese language enhancement.

Key Characteristics

  • Multilingual Capability: While primarily focused on Traditional Chinese, the model maintains strong English language performance, comparable to the original Llama 3, attributed to its continued pre-training on the FineWeb dataset.
  • Training Data: The model's training recipe includes a diverse mix of datasets such as FineWeb (English), Wudao (Simplified Chinese), C4Tw, WikiZhTw, NdltdT10 (Traditional Chinese), and GitHubMarkDown/GitHubPython (code).
  • Context Length: Supports a sequence length of 8192 tokens.

Performance Insights

Benchmarking indicates that while the Traditional Chinese CP did not surpass the original Llama 3's performance in Chinese benchmarks, its English MMLU scores remain competitive. For instance, it achieves an MMLU (EN, Knowledge) score of 65.31, slightly above the base Meta-Llama-3-8B's 65.17.

Use Cases

This model is suitable for developers and applications requiring a robust Llama 3-based model with enhanced Traditional Chinese language understanding, without compromising its strong English capabilities. It's particularly useful for tasks involving content generation, translation, or analysis in both English and Traditional Chinese contexts.