haoranxu/ALMA-13B

TEXT GENERATIONConcurrency Cost:1Model Size:13BQuant:FP8Ctx Length:4kPublished:Sep 17, 2023License:mitArchitecture:Transformer0.0K Open Weights Cold

ALMA-13B is a 13 billion parameter language model developed by Haoran Xu and collaborators, based on the LLaMA-2 architecture. It is specifically designed for machine translation, utilizing a two-step fine-tuning process involving monolingual data followed by high-quality parallel data. This model excels at translation tasks, offering strong performance in converting text between languages.

Loading preview...

ALMA-13B: Advanced Language Model-based Translator

ALMA-13B is a 13 billion parameter model from the ALMA (Advanced Language Model-based Translator) family, developed by Haoran Xu and collaborators. It represents a novel paradigm in machine translation, built upon the LLaMA-2 architecture and optimized through a unique two-stage fine-tuning process.

Key Capabilities & Training

  • Two-Step Fine-tuning: ALMA models are initially fine-tuned on a large corpus of monolingual data (12 billion tokens for ALMA-13B) to establish strong language understanding. This is followed by a second stage of fine-tuning on high-quality human-written parallel data, specifically targeting translation performance.
  • Translation Optimization: The model is explicitly designed and optimized for machine translation tasks, aiming to deliver robust and accurate cross-language text conversion.
  • ALMA-R Variants: Newer ALMA-R versions (e.g., ALMA-13B-R) further enhance translation capabilities by incorporating Contrastive Preference Optimization (CPO) using triplet preference data, which has shown to match or exceed performance of models like GPT-4 or WMT winners.

Use Cases

  • Machine Translation: Ideal for applications requiring high-quality translation between languages.
  • Research & Development: Provides a strong baseline for further research into LLM-based translation paradigms and preference optimization techniques.