Gemma-SEA-LION-v4-4B-VL: A Vision-Language Model for Southeast Asia
Gemma-SEA-LION-v4-4B-VL is a 4.3 billion parameter Vision-Language Model (VLM) developed by AI Singapore, based on the gemma-3-4b-it architecture. It has been rigorously post-trained on approximately 6.7 million instruction-text pairs to achieve strong domain adaptation for the Southeast Asian (SEA) region. This extensive training instills multilingual and multicultural fluency across key SEA languages such as Indonesian, Vietnamese, Thai, Filipino, Tamil, Burmese, and Malay.
Key Capabilities
- Multilingual Fluency: Enhanced understanding and generation in various Southeast Asian languages.
- Vision-Language Integration: Inherits image and text capabilities from the base Gemma model, with experimental visual parsing in Thai, Chinese, and English.
- Tool Calling: Includes function calling capabilities, enabling its use in tool-calling applications.
- Large Context Window: Features a 128K token context length, allowing for processing of extensive inputs.
When to Use This Model
This model is particularly well-suited for applications requiring:
- Multilingual interactions within the Southeast Asian linguistic context.
- Vision-language tasks that benefit from regional language understanding.
- Agentic workflows leveraging its tool-calling functionality.
Evaluations on SEA-HELM, SEA-IFEval, and SEA-MTBench demonstrate its performance in general language capabilities, instruction-following, and multi-turn chat, with specific benchmarks for tool calling and visual parsing also available.