aitf-kpm-ugm/Qwen3-4B-CPT-Base
The aitf-kpm-ugm/Qwen3-4B-CPT-Base is a 4 billion parameter causal decoder-only LLM developed by AITF UGM 2026. It is a continued pre-trained (CPT) variant of Qwen3-4B-Base, specifically adapted for Indonesian using a ~200M-token corpus. This base model excels at Indonesian-domain text completion and serves as a foundation for downstream supervised fine-tuning (SFT) on various Indonesian NLP tasks.
Loading preview...
Qwen3-4B-CPT-Base: Indonesian Domain Adaptation
This model, developed by AITF UGM 2026, is a continued pre-trained (CPT) variant of the Qwen/Qwen3-4B-Base model. It has been extensively adapted to the Indonesian language through pre-training on a diverse ~200 million-token Indonesian corpus, comprising news (70%), Wikipedia (20%), and social media (10%) data.
Key Capabilities & Differentiators
- Indonesian Language Specialization: Achieves a significant ~23% perplexity reduction overall compared to the vanilla Qwen3-4B-Base model on Indonesian text, demonstrating strong domain adaptation. It even outperforms the vanilla Qwen3-8B-Base on all tested Indonesian subsets.
- Base Model for SFT: Designed as a foundational model for subsequent supervised fine-tuning (SFT) for specific Indonesian NLP tasks.
- Efficient Training: Utilizes LoRA with specific hyperparameters (rank 128, alpha 256) and bf16 mixed precision for efficient continued pre-training.
Use Cases
- Indonesian Text Completion: Direct use for generating coherent Indonesian text.
- Foundation for Downstream Tasks: Ideal as a base for fine-tuning on tasks like Indonesian summarization, sentiment analysis (ABSA), chatbot Q&A, and narrative analysis.
- Perplexity Benchmarking: Useful for evaluating Indonesian language model performance against other baselines.
Limitations
As a base model, it is not instruction-tuned and therefore not suitable for chat, instruction-following, or generating structured outputs like JSON without further SFT. Its training data, being news-heavy, may introduce biases reflecting media and social media narratives.