Name: SEACrowd/Gemma-SEA-LION-v4-27B-VL API
Brand: Featherless.ai
Price: 25.00 USD
Availability: InStock
Author: SEACrowd

Gemma-SEA-LION-v4-27B-VL: Vision-Text Model for Southeast Asia

Gemma-SEA-LION-v4-27B-VL is a 27 billion parameter instruct-tuned vision-text model developed by SEACrowd and AI Singapore. It is built upon the Gemma 3 architecture, inheriting its large 128K context length and robust image and text understanding capabilities, including document comprehension, visual Q&A, and image-grounded reasoning. The model also supports advanced function calling and structured outputs for seamless system integration.

Key Capabilities & Differentiators

Multilingual Vision-Text Understanding: Post-trained on approximately 540k instruction-image pairs in Burmese, English, Indonesian, Khmer, Lao, Malay, Mandarin, Tagalog, Tamil, Thai, and Vietnamese.
Southeast Asian Task Optimization: Excels at tasks specific to the Southeast Asian region, demonstrating performance comparable to larger closed models and outperforming other open models under 200 billion parameters as of October 2025.
Comprehensive Vision-Text Features: Capable of visual question answering, image captioning, and image-grounded reasoning.

Use Cases & Limitations

This model is particularly well-suited for applications requiring deep cultural and visual understanding within Southeast Asian contexts. It has been evaluated on VQA tasks (MARVL, CVQA, WorldCuisines) and image captioning (XM3600) with a focus on SEA examples. While strong in vision-text, its text-only capabilities are comparable to its base model, Gemma-SEA-LION-v4-27B-IT, without significant improvements in that area. Users should be aware of potential hallucinations and the need for safety fine-tuning, as the model has not been aligned for safety.