Overview
ALMA-13B-Pretrain: Foundation for Advanced Machine Translation
ALMA-13B-Pretrain is a 13 billion parameter model based on LLaMA-2, developed by Haoran Xu and collaborators. It represents the first stage of the ALMA (Advanced Language Model-based Translator) paradigm, which focuses on boosting translation performance of large language models. This model has undergone extensive monolingual fine-tuning on 12 billion tokens.
Key Characteristics
- Two-Stage Fine-tuning Paradigm: ALMA models are initially fine-tuned on monolingual data (as seen in ALMA-13B-Pretrain) and then further optimized with high-quality parallel data for translation tasks.
- Foundation for ALMA-13B-LoRA: This specific model (
haoranxu/ALMA-13B-Pretrain) is designed to be used in conjunction with its corresponding LoRA model (haoranxu/ALMA-13B-Pretrain-LoRA) to achieve translation capabilities. - ALMA-R Series: Builds upon ALMA models, utilizing Contrastive Preference Optimization (CPO) for further LoRA fine-tuning, with ALMA-13B-R matching or exceeding GPT-4 and WMT winners in translation performance.
Intended Use
- Base for Translation Fine-tuning: This model is intended as a base for further LoRA fine-tuning with parallel data to create specialized machine translation systems.
- Research and Development: Suitable for researchers exploring advanced fine-tuning techniques for LLM-based translation, particularly those interested in the ALMA paradigm and Contrastive Preference Optimization.