Overview
Sunflower-14B: Multilingual Model for Ugandan Languages
Sunflower-14B, developed by Sunbird AI, is a 14 billion parameter causal language model based on the Qwen 3 architecture. It is uniquely focused on supporting 31 Ugandan languages alongside English, making it a specialized tool for regional linguistic tasks.
Key Capabilities
- Multilingual Translation: Achieves high accuracy in translation between English and Ugandan languages, and between different Ugandan languages. It outperforms Gemini 2.5 Pro and GPT-4o in average chrF scores for these language pairs.
- Text Generation: Capable of generating text in various Ugandan languages.
- Question Answering: Supports question answering in Ugandan languages.
- Robust Training: Trained on a diverse dataset of approximately 750 million characters, including digitized books, radio transcripts, web data (MADLAD-400, Common Crawl), and existing multilingual datasets, followed by supervised fine-tuning and Iterative Reasoning Preference Optimization (RPO) to reduce glitches and hallucinations.
Good For
- Developers and researchers working on applications requiring accurate translation or text generation for Ugandan languages.
- Projects focused on linguistic diversity and supporting under-resourced languages.
- Use cases where regional language understanding and generation are critical, such as educational tools, local content creation, or communication platforms in Uganda.
Limitations
Performance can vary across languages due to data availability. The model is not recommended for critical applications without human oversight and may reflect biases from its training data.