Yvthyvq/Liujgoj-Cantonese-Qwen3-8B-Base
Yvthyvq/Liujgoj-Cantonese-Qwen3-8B-Base is an 8 billion parameter Qwen3-Base model developed by Yvthyvq, specifically fine-tuned for the Liujgoj Cantonese Orthography. This model undergoes continued pre-training on speech-native Latinized Cantonese text, aiming to deeply internalize Liujgoj's syllable structure and tone encoding. Its primary purpose is to enable machine-native understanding and generation of Cantonese text independent of Hanzi, excelling at text completion and continuation in Liujgoj.
Loading preview...
Overview
Yvthyvq/Liujgoj-Cantonese-Qwen3-8B-Base is an 8 billion parameter model built upon the Qwen3-8B-Base architecture. It has undergone Continued Pre-Training (CPT) specifically for the Liujgoj Cantonese Orthography.
Key Capabilities
- Liujgoj Orthography Integration: The model is deeply trained to internalize the syllable structure and tone encoding of the Liujgoj system, a complete independent Cantonese Latin orthography designed for writing and reading.
- Hanzi Independence: A core objective is to enable machine-native Cantonese text understanding and generation without reliance on traditional Chinese characters.
- Text Completion: As a CPT base model, it demonstrates strong intuition for Liujgoj text continuation and completion tasks.
Training Details
- Base Model:
Qwen3-8B-Base - Training Pipeline: LLaMA-Factory (LoRA CPT, 3 Epochs)
- Corpus: Curated transcriptions of spoken dialogues from classic Hong Kong films.
- Performance: Achieved a final training loss of 1.5951, indicating deep semantic alignment.
Usage
This model is currently a CPT base model, best suited for pure text continuation using OpenAI-compatible completions or chat/completions interfaces. It is designed for developers working with the Liujgoj Cantonese Orthography.