haoranxu/ALMA-13B-Pretrain

Warm
Public
13B
FP8
4096
Sep 17, 2023
License: mit
Hugging Face
Overview

ALMA-13B-Pretrain: Foundation for Advanced Machine Translation

ALMA-13B-Pretrain is a 13 billion parameter model based on LLaMA-2, developed by Haoran Xu and collaborators. It represents the first stage of the ALMA (Advanced Language Model-based Translator) paradigm, which focuses on boosting translation performance of large language models. This model has undergone extensive monolingual fine-tuning on 12 billion tokens.

Key Characteristics

  • Two-Stage Fine-tuning Paradigm: ALMA models are initially fine-tuned on monolingual data (as seen in ALMA-13B-Pretrain) and then further optimized with high-quality parallel data for translation tasks.
  • Foundation for ALMA-13B-LoRA: This specific model (haoranxu/ALMA-13B-Pretrain) is designed to be used in conjunction with its corresponding LoRA model (haoranxu/ALMA-13B-Pretrain-LoRA) to achieve translation capabilities.
  • ALMA-R Series: Builds upon ALMA models, utilizing Contrastive Preference Optimization (CPO) for further LoRA fine-tuning, with ALMA-13B-R matching or exceeding GPT-4 and WMT winners in translation performance.

Intended Use

  • Base for Translation Fine-tuning: This model is intended as a base for further LoRA fine-tuning with parallel data to create specialized machine translation systems.
  • Research and Development: Suitable for researchers exploring advanced fine-tuning techniques for LLM-based translation, particularly those interested in the ALMA paradigm and Contrastive Preference Optimization.