Yvthyvq/Liujgoj-Cantonese-Qwen3-8B-Base

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kTool Calling:SupportedPublished:Jun 1, 2026License:apache-2.0Architecture:Transformer Open Weights Cold

Yvthyvq/Liujgoj-Cantonese-Qwen3-8B-Base is an 8 billion parameter Qwen3-Base model developed by Yvthyvq, specifically fine-tuned for the Liujgoj Cantonese Orthography. This model undergoes continued pre-training on speech-native Latinized Cantonese text, aiming to deeply internalize Liujgoj's syllable structure and tone encoding. Its primary purpose is to enable machine-native understanding and generation of Cantonese text independent of Hanzi, excelling at text completion and continuation in Liujgoj.

Loading preview...

Overview

Yvthyvq/Liujgoj-Cantonese-Qwen3-8B-Base is an 8 billion parameter model built upon the Qwen3-8B-Base architecture. It has undergone Continued Pre-Training (CPT) specifically for the Liujgoj Cantonese Orthography.

Key Capabilities

  • Liujgoj Orthography Integration: The model is deeply trained to internalize the syllable structure and tone encoding of the Liujgoj system, a complete independent Cantonese Latin orthography designed for writing and reading.
  • Hanzi Independence: A core objective is to enable machine-native Cantonese text understanding and generation without reliance on traditional Chinese characters.
  • Text Completion: As a CPT base model, it demonstrates strong intuition for Liujgoj text continuation and completion tasks.

Training Details

  • Base Model: Qwen3-8B-Base
  • Training Pipeline: LLaMA-Factory (LoRA CPT, 3 Epochs)
  • Corpus: Curated transcriptions of spoken dialogues from classic Hong Kong films.
  • Performance: Achieved a final training loss of 1.5951, indicating deep semantic alignment.

Usage

This model is currently a CPT base model, best suited for pure text continuation using OpenAI-compatible completions or chat/completions interfaces. It is designed for developers working with the Liujgoj Cantonese Orthography.