haoranxu/ALMA-7B-Pretrain

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Sep 17, 2023License:mitArchitecture:Transformer0.0K Open Weights Cold

The ALMA-7B-Pretrain model by Haoran Xu is a 7 billion parameter language model with a 4096 token context length, serving as the base for the ALMA translation models. It undergoes an initial monolingual fine-tuning phase, making it a foundational component for subsequent translation-specific fine-tuning. This model is designed to be further optimized with LoRA weights for machine translation tasks, rather than being a standalone translation model.

Loading preview...

ALMA-7B-Pretrain: A Foundation for Advanced Machine Translation

This model, haoranxu/ALMA-7B-Pretrain, is a 7 billion parameter language model based on LLaMA-2-7B, specifically designed as a pre-trained base for the ALMA (Advanced Language Model-based Translator) series. It has undergone an initial fine-tuning phase on 20 billion monolingual tokens, establishing a strong linguistic foundation.

Key Characteristics:

  • Pre-training Stage: Represents the first stage of the ALMA translation paradigm, focusing on monolingual data fine-tuning.
  • Not a Standalone Translator: This Pretrain version is explicitly noted as not a translation model on its own. It requires further fine-tuning with LoRA weights to become a functional translator.
  • Foundation for ALMA-7B-LoRA and ALMA-7B-R: It serves as the base model for ALMA-7B-LoRA (which adds LoRA fine-tuning on human-written parallel data) and ALMA-7B-R (which further applies Contrastive Preference Optimization).

Intended Use:

This model is intended to be used in conjunction with its corresponding LoRA models (e.g., haoranxu/ALMA-7B-Pretrain-LoRA or haoranxu/ALMA-7B-R) to perform high-quality machine translation. Developers should load this base model and then apply the specific LoRA weights for translation tasks, following the two-step fine-tuning process outlined in the ALMA paper.