Gemma-SEA-LION-v4-27B-VL: Vision-Text Model for Southeast Asia

Gemma-SEA-LION-v4-27B-VL is a 27 billion parameter instruct-tuned vision-text model developed by SEACrowd and AI Products Pillar, AI Singapore. It is built upon the Gemma 3 architecture, inheriting its large 128K context length and robust image and text understanding capabilities. The model has undergone extensive post-training with approximately 540,000 instruction-image pairs in 11 languages, including Burmese, English, Indonesian, Khmer, Lao, Malay, Mandarin, Tagalog, Tamil, Thai, and Vietnamese.

Key Capabilities

Multilingual Vision-Text Understanding: Excels in comprehending and generating responses based on visual and textual inputs across a wide range of Southeast Asian languages.
Document Comprehension & Visual Q&A: Capable of understanding information within documents and answering questions grounded in images.
Image-Grounded Reasoning: Performs reasoning tasks that require interpreting visual information.
Advanced Function Calling: Supports structured outputs for seamless integration into larger systems.

Good For

Applications requiring strong vision-text capabilities in Southeast Asian contexts.
Tasks such as visual question answering (VQA) and image captioning, particularly with SEA-focused content.
Developers looking for a model with comparable performance to larger closed models on SEA tasks, as indicated by its strong leaderboard rankings.

Overview

Gemma-SEA-LION-v4-27B-VL: Vision-Text Model for Southeast Asia

Key Capabilities

Good For

Full Model Card (README)