Plume256k: A Parallel Language Model for Neural Machine Translation
Plume256k is a 2.6 billion parameter language model developed by the Language Technologies Unit from Barcelona Supercomputing Center (BSC). It is part of the Plume (Parallel Language Model) collection, which are the first LLMs trained exclusively on parallel data, specifically Catalan-centric examples, for Neural Machine Translation (NMT). This model shares the same architecture as Gemma 2B but is distinguished by its unique training methodology, focusing solely on parallel data rather than iterative instruction fine-tuning or continual pre-training.
Key Capabilities
- Specialized NMT: Designed from scratch for general translation tasks at the sentence level.
- Multilingual Translation: Proficient in 16 supervised translation directions involving Catalan, and capable of 56 additional zero-shot translation directions.
- Supported Languages: Includes Spanish, French, Italian, Portuguese, Galician, German, English, and Basque.
- Comparable Performance: Achieves translation performance comparable to previous encoder-decoder architectures like NLLB-1.3B and NLLB-600M on standard benchmarks such as Flores-200 and NTREX.
Good for
- Catalan-centric Translation: Ideal for applications requiring high-quality translation involving Catalan.
- General Multilingual Translation: Suitable for broader translation needs across its supported language set.
- Research in NMT: Provides a unique case study for investigating LLMs trained purely on parallel data, as detailed in its accompanying arXiv paper.