Name: sail/Sailor-7B API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: sail

Sailor-7B: Open Language Model for South-East Asia

Sailor-7B is a 7.7 billion parameter model from the Sailor suite, developed by sail and based on the Qwen 1.5 architecture. This model is uniquely optimized for South-East Asian (SEA) languages, including Indonesian, Thai, Vietnamese, Malay, and Lao, while maintaining proficiency in English and Chinese.

Key Capabilities

Multilingual Proficiency: Designed to understand and generate text across diverse SEA linguistic landscapes.
Strong Performance: Benchmarked for tasks such as question answering and commonsense reasoning in SEA languages.
Robust Training: Continuously pre-trained on 200 billion tokens from a high-quality, deduplicated corpus including SlimPajama, SkyPile, CC100, and MADLAD-400, with a focus on balancing language weights.
Instruction-Tuned Variants: Base models are further fine-tuned with open-source datasets to create instruction-following 'Sailor-Chat' versions.

Good For

Applications requiring high-quality language understanding and generation in Indonesian, Thai, Vietnamese, Malay, and Lao.
Research and commercial use under the Apache 2.0 License, with specific considerations for Qwen's license for large-scale commercial deployments (over 100 million monthly active users).
Developers looking for a model with a 32768 token context length optimized for the linguistic nuances of the SEA region.

Overview

Sailor-7B: Open Language Model for South-East Asia

Key Capabilities

Good For

Full Model Card (README)