haoranxu/ALMA-7B

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Sep 17, 2023License:mitArchitecture:Transformer0.0K Open Weights Cold

ALMA-7B is a 7 billion parameter language model developed by Haoran Xu, based on the LLaMA-2 architecture. It is specifically designed for machine translation, utilizing a two-step fine-tuning process that includes initial training on 20 billion monolingual tokens followed by optimization with high-quality human-written parallel data. This model excels in translation tasks, offering a specialized approach to LLM-based translation.

Loading preview...

ALMA-7B: Advanced Language Model-based Translator

ALMA-7B is a 7 billion parameter model built upon the LLaMA-2 architecture, developed by Haoran Xu. It introduces a novel paradigm for machine translation, focusing on a two-stage fine-tuning process to achieve strong translation performance.

Key Capabilities and Training:

  • Specialized Translation: Designed from the ground up for machine translation, moving beyond general-purpose LLMs for this specific task.
  • Two-Step Fine-tuning: The model undergoes initial full-weight fine-tuning on 20 billion monolingual tokens, followed by further full-weight fine-tuning on high-quality human-written parallel data.
  • ALMA-R Variant: A newer variant, ALMA-7B-R, builds upon ALMA-7B-LoRA by incorporating Contrastive Preference Optimization (CPO) using triplet preference data, which has shown to match or exceed the performance of models like GPT-4 or WMT winners in translation.
  • Research-Backed: The methodology and results are detailed in the paper "A Paradigm Shift in Machine Translation: Boosting Translation Performance of Large Language Models" (arXiv:2309.11674).

Use Cases:

  • High-Quality Machine Translation: Ideal for applications requiring accurate and nuanced translation between languages.
  • Research and Development: Provides a strong baseline and advanced techniques for researchers exploring LLM-based translation and preference optimization methods.
  • Integration into Translation Workflows: Can be used as a core component in systems requiring robust language translation capabilities.