p208p2002/llama-3-zhtw-8B
The p208p2002/llama-3-zhtw-8B is an 8 billion parameter Llama 3-based language model developed by p208p2002, fine-tuned with 800M additional tokens for Traditional Chinese (zhtw) language capabilities. It maintains the original Llama 3's English MMLU performance due to its continued pre-training on FineWeb, while also incorporating Chinese and code datasets. This model is designed for applications requiring strong English language understanding alongside Traditional Chinese processing, offering a balanced performance profile.
Loading preview...
Llama 3 zhtw: A Traditional Chinese Enhanced Llama 3 Model
p208p2002/llama-3-zhtw-8B is an 8 billion parameter model based on the Llama 3 architecture, developed by p208p2002. It underwent continued pre-training (CP) with an additional 800 million tokens, specifically targeting Traditional Chinese language enhancement.
Key Characteristics
- Multilingual Capability: While primarily focused on Traditional Chinese, the model maintains strong English language performance, comparable to the original Llama 3, attributed to its continued pre-training on the FineWeb dataset.
- Training Data: The model's training recipe includes a diverse mix of datasets such as FineWeb (English), Wudao (Simplified Chinese), C4Tw, WikiZhTw, NdltdT10 (Traditional Chinese), and GitHubMarkDown/GitHubPython (code).
- Context Length: Supports a sequence length of 8192 tokens.
Performance Insights
Benchmarking indicates that while the Traditional Chinese CP did not surpass the original Llama 3's performance in Chinese benchmarks, its English MMLU scores remain competitive. For instance, it achieves an MMLU (EN, Knowledge) score of 65.31, slightly above the base Meta-Llama-3-8B's 65.17.
Use Cases
This model is suitable for developers and applications requiring a robust Llama 3-based model with enhanced Traditional Chinese language understanding, without compromising its strong English capabilities. It's particularly useful for tasks involving content generation, translation, or analysis in both English and Traditional Chinese contexts.
Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.