Sahabat-AI/gemma2-9b-cpt-sahabatai-v1-instruct
Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:9BQuant:FP8Ctx Length:16kPublished:May 30, 2025License:gemmaArchitecture:Transformer0.0K Warm

Gemma2 9B CPT Sahabat-AI v1 Instruct is a 9 billion parameter instruction-tuned decoder-only language model developed by PT GoTo Gojek Tokopedia Tbk and AI Singapore, based on the Gemma2 architecture. It is specifically fine-tuned for Indonesian language and its dialects (Javanese, Sundanese), alongside English, with a context length of 8192 tokens. This model excels in general language and instruction-following tasks across these languages, demonstrating superior performance on the SEA HELM and IndoMMLU benchmarks for Indonesian and its dialects.

Loading preview...

Sahabat-AI/gemma2-9b-cpt-sahabatai-v1-instruct Overview

This model is a 9 billion parameter instruction-tuned variant of the Gemma2 architecture, developed by PT GoTo Gojek Tokopedia Tbk and AI Singapore. It is part of the Sahabat-AI ecosystem, which focuses on developing Large Language Models for the Indonesian language and its various dialects. The model has been fine-tuned using approximately 448,000 Indonesian instruction-completion pairs, supplemented by 96,000 Javanese and 98,000 Sundanese instruction-completion pairs, and an additional 129,000 English pairs.

Key Capabilities & Performance

  • Multilingual Proficiency: Optimized for English, Indonesian, Javanese, and Sundanese, making it highly suitable for applications requiring understanding and generation in these languages.
  • Instruction Following: Evaluated using the IFEval dataset, demonstrating strong adherence to constraints provided in prompts, including language-specific responses.
  • Benchmark Performance: Achieves a 61.169% overall score on SEA HELM (BHASA) across Indonesian, Javanese, and Sundanese, outperforming other 7B-9B models like Qwen2, Llama-3, and sea-lionv2.1. It also scores 62.6% on IndoMMLU and 33.67% on English benchmarks, indicating robust general language capabilities.
  • Context Length: Features a context length of 8192 tokens, allowing for processing longer inputs and generating more coherent responses.

Use Cases

This model is particularly well-suited for applications requiring high-quality language understanding and generation in Indonesian, Javanese, and Sundanese, such as:

  • Customer Support: Building chatbots or virtual assistants for Indonesian-speaking users.
  • Content Creation: Generating text, summaries, or translations in Indonesian and its dialects.
  • Educational Tools: Developing language learning applications or educational content for these languages.
  • Research: Exploring multilingual NLP tasks and instruction-following in low-resource languages.