AksaraLLM/AksaraLLM-Qwen-1.5B-v5-public
AksaraLLM-Qwen-1.5B-v5-public is an AksaraLLM-developed Qwen2-based causal language model with 1.78 billion parameters and a 32,768-token context length. It is specifically fine-tuned for Indonesian language tasks, demonstrating a perplexity of 8.4 on Indonesian text. This model is optimized as a daily-driver Indonesian LM, excelling in coherent factual Indonesian completions while also supporting English.
Loading preview...
AksaraLLM-Qwen-1.5B-v5-public Overview
AksaraLLM-Qwen-1.5B-v5-public is a 1.78 billion parameter language model based on the Qwen2 architecture, developed by AksaraLLM. It features a substantial context length of 32,768 tokens. This model is specifically tuned for the Indonesian language, achieving a perplexity of 8.4 on a baseline audit of 50 short Indonesian sentences, making it a strong performer in its size class for Indonesian text generation.
Key Capabilities & Characteristics
- Indonesian Language Proficiency: Demonstrates high coherence and factual accuracy in Indonesian completions, serving as a recommended daily-driver Indonesian LM.
- Bilingual Support: Capable of generating coherent English text, indicating bilingual (Indonesian/English) functionality.
- Efficient Performance: Despite its relatively small size (1.78B parameters), it shows competitive performance for Indonesian tasks.
- Qwen2 Architecture: Built upon the robust Qwen2 framework, ensuring a solid foundation for language understanding and generation.
Recommended Use Cases
- Indonesian Text Generation: Ideal for applications requiring factual completions, creative writing, or general conversational responses in Indonesian.
- Bilingual Applications: Suitable for scenarios where both Indonesian and English language processing are needed.
- Resource-Constrained Environments: Its 1.78B parameter count makes it a viable option for deployment where larger models might be impractical.
Known Issues & Recommendations
- A
tie_word_embeddingsconfiguration bug requires settingtie_word_embeddings: falseinconfig.jsonto prevent potential model corruption upon re-saving. - The model's identity is uncalibrated, meaning it may identify as Qwen. Identity SFT (Supervised Fine-Tuning) is recommended for specific persona alignment.
- Lacks a bundled chat template, requiring manual application of Qwen2 ChatML for conversational use cases.