haoranxu/ALMA-7B
TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Sep 17, 2023License:mitArchitecture:Transformer0.0K Open Weights Cold
ALMA-7B is a 7 billion parameter language model developed by Haoran Xu, based on the LLaMA-2 architecture. It is specifically designed for machine translation, utilizing a two-step fine-tuning process that includes initial training on 20 billion monolingual tokens followed by optimization with high-quality human-written parallel data. This model excels in translation tasks, offering a specialized approach to LLM-based translation.
Loading preview...
ALMA-7B: Advanced Language Model-based Translator
ALMA-7B is a 7 billion parameter model built upon the LLaMA-2 architecture, developed by Haoran Xu. It introduces a novel paradigm for machine translation, focusing on a two-stage fine-tuning process to achieve strong translation performance.
Key Capabilities and Training:
- Specialized Translation: Designed from the ground up for machine translation, moving beyond general-purpose LLMs for this specific task.
- Two-Step Fine-tuning: The model undergoes initial full-weight fine-tuning on 20 billion monolingual tokens, followed by further full-weight fine-tuning on high-quality human-written parallel data.
- ALMA-R Variant: A newer variant, ALMA-7B-R, builds upon ALMA-7B-LoRA by incorporating Contrastive Preference Optimization (CPO) using triplet preference data, which has shown to match or exceed the performance of models like GPT-4 or WMT winners in translation.
- Research-Backed: The methodology and results are detailed in the paper "A Paradigm Shift in Machine Translation: Boosting Translation Performance of Large Language Models" (arXiv:2309.11674).
Use Cases:
- High-Quality Machine Translation: Ideal for applications requiring accurate and nuanced translation between languages.
- Research and Development: Provides a strong baseline and advanced techniques for researchers exploring LLM-based translation and preference optimization methods.
- Integration into Translation Workflows: Can be used as a core component in systems requiring robust language translation capabilities.