Gemma-SEA-LION-v4-27B-VL: Vision-Text Model for Southeast Asia
Gemma-SEA-LION-v4-27B-VL is a 27 billion parameter instruct-tuned vision-text model developed by SEACrowd and AI Products Pillar, AI Singapore. It is built upon the Gemma 3 architecture, inheriting its large 128K context length and robust image and text understanding capabilities. The model has undergone extensive post-training with approximately 540,000 instruction-image pairs in 11 languages, including Burmese, English, Indonesian, Khmer, Lao, Malay, Mandarin, Tagalog, Tamil, Thai, and Vietnamese.
Key Capabilities
- Multilingual Vision-Text Understanding: Excels in comprehending and generating responses based on visual and textual inputs across a wide range of Southeast Asian languages.
- Document Comprehension & Visual Q&A: Capable of understanding information within documents and answering questions grounded in images.
- Image-Grounded Reasoning: Performs reasoning tasks that require interpreting visual information.
- Advanced Function Calling: Supports structured outputs for seamless integration into larger systems.
Good For
- Applications requiring strong vision-text capabilities in Southeast Asian contexts.
- Tasks such as visual question answering (VQA) and image captioning, particularly with SEA-focused content.
- Developers looking for a model with comparable performance to larger closed models on SEA tasks, as indicated by its strong leaderboard rankings.