Overview

Gemma-SEA-LION-v4-4B-VL is a 4-billion parameter Vision-Language Model (VLM) developed by AI Singapore, based on the gemma-3-4b-it architecture. It features a substantial context length of 128K tokens and has undergone extensive post-training on approximately 6.7 million instruction-text pairs. This training specifically targets Southeast Asian (SEA) languages and cultural nuances, enhancing its multilingual and multicultural fluency.

Key Capabilities

Multilingual Fluency: Optimized for Indonesian, Vietnamese, Thai, Filipino, Tamil, Burmese, and Malay.
Vision-Language Integration: Inherits image and text capabilities from its base model, with enhanced visual parsing in Thai, Chinese, and English.
Tool Calling: Includes function calling capabilities, enabling its use in tool-calling applications.
Robust Evaluation: Evaluated using SEA-HELM for general language capabilities, SEA-IFEval for instruction-following, and SEA-MTBench for multi-turn chat, with specific metrics for various tasks including QA, sentiment analysis, and translation.

When to Use This Model

This model is particularly well-suited for applications requiring:

Multilingual processing in Southeast Asian languages.
Vision-language tasks with a focus on text extraction in Thai, Chinese, and English.
Tool-augmented conversational agents or applications that benefit from function calling.

It is important to note that the model has not been aligned for safety, and users should implement their own safety measures.

Overview

Overview

Key Capabilities

When to Use This Model

Full Model Card (README)