Overview

Qwen-SEA-LION-v4-4B-VL is a 4-billion parameter Vision-Language Model (VLM) developed by AI Singapore, specifically designed for the Southeast Asian (SEA) region. It is built on the Qwen3-VL-4B-Instruct architecture and has undergone extensive supervised fine-tuning (SFT) using approximately 9 million instruction-text pairs. This post-training process instills strong multilingual and multicultural fluency, covering English and seven key SEA languages: Burmese, Indonesian, Filipino, Malay, Tamil, Thai, and Vietnamese.

Key Capabilities

Multilingual and Multicultural Fluency: Fine-tuned for English and 7 SEA languages, making it highly relevant for regional applications.
Vision-Language Model (VLM): Inherits enhanced vision-language capabilities from the Qwen3-VL architecture, including Visual Question Answering (VQA) and Image Captioning.
Long-Context Multimodal Architecture: Features a native 256K context window, supporting complex multimodal inputs.
Edge-Optimized Inference: Designed for resource-efficient deployment.
Tool Use: Supports tool use functionalities.

Evaluation and Performance

The model was evaluated on general language capabilities using the SEA-HELM evaluation benchmark, covering tasks like QA, Sentiment Analysis, and Translation. Instruction-following and multi-turn chat capabilities were assessed with SEA-IFEval and SEA-MTBench, respectively. Notably, despite text-only fine-tuning, the model successfully retains the high-performance vision-language capabilities of its base model, as confirmed by evaluations on VQA and Image Captioning tasks using SEA-specific datasets.

Good for

Applications requiring strong language understanding and generation in Southeast Asian languages.
Multimodal tasks involving both text and images, particularly within a SEA context.
Use cases where a long context window for multimodal input is beneficial.
Deployment in resource-constrained environments due to its edge-optimized design.

Overview

Overview

Key Capabilities

Evaluation and Performance

Good for

Full Model Card (README)