p208p2002/llama-3-zhtw-8B

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:8kLicense:llama3Architecture:Transformer0.0K Warm

The p208p2002/llama-3-zhtw-8B is an 8 billion parameter Llama 3-based language model developed by p208p2002, fine-tuned with 800M additional tokens for Traditional Chinese (zhtw) language capabilities. It maintains the original Llama 3's English MMLU performance due to its continued pre-training on FineWeb, while also incorporating Chinese and code datasets. This model is designed for applications requiring strong English language understanding alongside Traditional Chinese processing, offering a balanced performance profile.

Loading preview...

Llama 3 zhtw: A Traditional Chinese Enhanced Llama 3 Model

p208p2002/llama-3-zhtw-8B is an 8 billion parameter model based on the Llama 3 architecture, developed by p208p2002. It underwent continued pre-training (CP) with an additional 800 million tokens, specifically targeting Traditional Chinese language enhancement.

Key Characteristics

  • Multilingual Capability: While primarily focused on Traditional Chinese, the model maintains strong English language performance, comparable to the original Llama 3, attributed to its continued pre-training on the FineWeb dataset.
  • Training Data: The model's training recipe includes a diverse mix of datasets such as FineWeb (English), Wudao (Simplified Chinese), C4Tw, WikiZhTw, NdltdT10 (Traditional Chinese), and GitHubMarkDown/GitHubPython (code).
  • Context Length: Supports a sequence length of 8192 tokens.

Performance Insights

Benchmarking indicates that while the Traditional Chinese CP did not surpass the original Llama 3's performance in Chinese benchmarks, its English MMLU scores remain competitive. For instance, it achieves an MMLU (EN, Knowledge) score of 65.31, slightly above the base Meta-Llama-3-8B's 65.17.

Use Cases

This model is suitable for developers and applications requiring a robust Llama 3-based model with enhanced Traditional Chinese language understanding, without compromising its strong English capabilities. It's particularly useful for tasks involving content generation, translation, or analysis in both English and Traditional Chinese contexts.

Popular Sampler Settings

Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.

temperature
top_p
top_k
frequency_penalty
presence_penalty
repetition_penalty
min_p