haoranxu/X-ALMA-13B-Pretrain

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:13BQuant:FP8Ctx Length:4kPublished:Jun 27, 2024License:mitArchitecture:Transformer0.0K Open Weights Warm

X-ALMA-13B-Pretrain is a 13 billion parameter multilingual pre-trained base model developed by Haoran Xu, expanding upon the ALMA-R architecture. It supports 50 languages through a plug-and-play design with language-specific modules and a specialized training recipe. This model is primarily designed for high-quality translation tasks and multilingual open-ended question answering.

Loading preview...

X-ALMA-13B-Pretrain Overview

X-ALMA-13B-Pretrain is a 13 billion parameter multilingual pre-trained base model, developed by Haoran Xu, that significantly expands language support from 6 to 50 languages compared to its predecessor, ALMA-R. This model utilizes a unique plug-and-play architecture, incorporating language-specific modules alongside a carefully designed training methodology to achieve its broad linguistic coverage.

Key Capabilities

  • Extensive Multilingual Support: Pre-trained on 50 languages, including English, Chinese, Japanese, Korean, German, French, Spanish, Arabic, and many others.
  • Modular Architecture: Employs a plug-and-play design with language-specific modules, allowing for flexible integration and potentially efficient scaling.
  • Translation Focus: Primarily designed for high-quality machine translation, demonstrated with examples for Chinese to English translation.
  • Multilingual QA: Capable of multilingual open-ended question answering.
  • Flexible Loading Options: Supports loading as a merged model (recommended for ease of use), as a base model with a specific language module, or as a base model with all language-specific modules (requiring substantial GPU memory).

Good For

  • Multilingual Translation: Ideal for applications requiring translation across a wide array of languages.
  • Multilingual NLP Research: Provides a strong foundation for research in multilingual natural language processing, especially concerning modular architectures.
  • Resource-Efficient Deployment: The modular design allows for loading only necessary language modules, potentially optimizing resource usage for specific language pairs.

Popular Sampler Settings

Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.

temperature
top_p
top_k
frequency_penalty
presence_penalty
repetition_penalty
min_p