ALMA-13B: Advanced Language Model-based Translator
ALMA-13B is a 13 billion parameter model from the ALMA (Advanced Language Model-based Translator) family, developed by Haoran Xu and collaborators. It represents a novel paradigm in machine translation, built upon the LLaMA-2 architecture and optimized through a unique two-stage fine-tuning process.
Key Capabilities & Training
- Two-Step Fine-tuning: ALMA models are initially fine-tuned on a large corpus of monolingual data (12 billion tokens for ALMA-13B) to establish strong language understanding. This is followed by a second stage of fine-tuning on high-quality human-written parallel data, specifically targeting translation performance.
- Translation Optimization: The model is explicitly designed and optimized for machine translation tasks, aiming to deliver robust and accurate cross-language text conversion.
- ALMA-R Variants: Newer ALMA-R versions (e.g., ALMA-13B-R) further enhance translation capabilities by incorporating Contrastive Preference Optimization (CPO) using triplet preference data, which has shown to match or exceed performance of models like GPT-4 or WMT winners.
Use Cases
- Machine Translation: Ideal for applications requiring high-quality translation between languages.
- Research & Development: Provides a strong baseline for further research into LLM-based translation paradigms and preference optimization techniques.