Name: sail/Sailor-0.5B-Chat API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: sail

Sailor-0.5B-Chat: South-East Asian Language Model

Sailor-0.5B-Chat is a 0.6 billion parameter instruction-tuned model from the Sailor suite, developed by sail. It is built on the Qwen 1.5 architecture and specifically optimized for South-East Asian (SEA) languages, including Indonesian, Thai, Vietnamese, Malay, and Lao. The model also retains strong performance in English and Chinese.

Key Capabilities & Training

Multilingual Proficiency: Designed to understand and generate text across the diverse linguistic landscape of the SEA region.
Instruction-Tuned: Fine-tuned with publicly available datasets like aya_collection, aya_dataset, and OpenOrca to enhance its conversational abilities.
Robust Pre-training: Continuously pre-trained from Qwen 1.5 models using a high-quality, deduplicated corpus including SlimPajama, SkyPile, CC100, and MADLAD-400.
Optimized for SEA: Underwent systematic experiments to determine optimal weights for different SEA languages, with the 0.5B model trained on 400 billion tokens.
Benchmarked Performance: Demonstrates proficiency in tasks such as question answering and commonsense reasoning in SEA languages.

Use Cases

Applications requiring strong language understanding and generation in Indonesian, Thai, Vietnamese, Malay, and Lao.
Chatbots and conversational AI systems targeting users in South-East Asia.
Research and development in low-resource language NLP for the SEA region.