benchang1110/Qwen2.5-Taiwan-7B-Instruct

Warm
Public
7.6B
FP8
131072
License: apache-2.0
Hugging Face
Overview

Model Overview

benchang1110/Qwen2.5-Taiwan-7B-Instruct is a 7.6 billion parameter language model developed by benchang1110, building upon the powerful Qwen/Qwen2.5-7B-Instruct base model. This model is specifically optimized for Traditional Chinese (zh-tw), making it highly effective for applications requiring nuanced understanding and generation in this language.

Key Differentiators

  • Tokenizer Swapping: Utilizes a unique tokenizer swapping technique to adapt the base model's Simplified Chinese tokens to Traditional Chinese, enhancing its linguistic accuracy for Taiwanese contexts.
  • Instruction Tuning (SFT): Fine-tuned with lianghsun/tw-instruct-500k using LoRA, preserving the base model's strong general capabilities while specializing in Traditional Chinese instructions.
  • Alignment (DPO): Further aligned using zake7749/kyara-chinese-preference-rl-dpo-s0-30K to produce structured, logical, and list-based outputs.
  • High Context Length: Supports a substantial context length of 131072 tokens, enabling processing of extensive conversations and documents.

Performance

The model demonstrates strong performance on Traditional Chinese benchmarks:

  • TMLU: Achieves 68.27% accuracy, outperforming other comparable models like Llama-3.2-Taiwan-3B-Instruct (36.82%) and Llama-3-Taiwan-8B-Instruct (59.50%).
  • TMMLU+: Scores 58.60% accuracy, also leading its class against models such as Llama-3.2-Taiwan-3B-Instruct (31.15%) and Llama-3-Taiwan-8B-Instruct (52.28%).

Use Cases

This model excels in:

  • Multi-turn conversations in Traditional Chinese.
  • Complex text generation, such as article writing and formal letter composition.
  • Summarization of Traditional Chinese documents.
  • Role-playing scenarios with customizable system prompts.
  • Taiwan-specific knowledge understanding, as demonstrated by its ability to list Taiwanese attractions.