Overview
yentinglin/Llama-3-Taiwan-70B-Instruct is a 70 billion parameter language model built upon the Llama-3 architecture, specifically fine-tuned for Traditional Mandarin and English. Developed by yentinglin, this model leverages a substantial corpus of both general and industrial-specific data, including legal, manufacturing, medical, and electronics domains.
Key Capabilities
- Bilingual Proficiency: Strong capabilities in Traditional Mandarin (zh-tw) and English (en).
- Comprehensive NLP: Excels in language understanding, generation, reasoning, and multi-turn dialogue.
- Domain-Specific Knowledge: Enhanced performance in specialized fields due to targeted training data.
- Function Calling: Supports function calling, with recommendations for constrained decoding for JSON mode.
- Long Context: Features an 8K context length, with a 128K version also available.
Performance Highlights
The model demonstrates competitive performance on various Traditional Mandarin NLP benchmarks. For instance, it achieves 74.76% on TMLU and 80.95% on Taiwan Truthful QA, often outperforming or closely matching other large models like Claude-3-Opus and GPT-4o on specific Traditional Mandarin tasks. It also shows strong results on Legal Eval and TMMLU+ benchmarks. The model was trained using the NVIDIA NeMo Framework on NVIDIA DGX H100 systems.
Good for
- Multiturn Dialogue: Engaging in natural and extended conversations.
- RAG (Retrieval Augmented Generation): Enhancing responses with retrieved information, as demonstrated by its web search integration.
- Structured Output & Entity Recognition: Generating formatted outputs, understanding language nuances, and identifying entities, particularly useful with constrained decoding for JSON output.
- Taiwanese Contexts: Ideal for applications requiring deep understanding and generation in Traditional Mandarin, especially within legal, medical, and industrial sectors relevant to Taiwan.