Unbabel/Tower-Plus-9B

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:9BQuant:FP8Ctx Length:16kPublished:Jun 9, 2025License:cc-by-nc-sa-4.0Architecture:Transformer0.0K Open Weights Warm

Unbabel/Tower-Plus-9B is a 9-billion parameter language model developed by Unbabel, built upon Gemma 2 9B. It undergoes Continuous Pretraining, Instruction Tuning, and Weighted Preference Optimization, incorporating parallel and multilingual data across 22 languages. This model is specifically fine-tuned for translation-related tasks and general instruction-following, excelling as a multilingual LLM under 10B parameters, particularly strong in machine translation.

Loading preview...

Unbabel/Tower-Plus-9B: Multilingual LLM for Translation and General Tasks

Unbabel/Tower-Plus-9B is a 9-billion parameter language model developed by Unbabel, based on Gemma 2 9B. It has been rigorously trained through Continuous Pretraining (CPT), Instruction Tuning (IT), and Weighted Preference Optimization (WPO), incorporating extensive parallel and multilingual datasets covering 22 languages. This comprehensive training regimen positions Tower-Plus-9B as a leading multilingual LLM in its size class.

Key Capabilities

  • Exceptional Multilingual Machine Translation: Optimized for translation-related tasks, demonstrating particular strength in machine translation across its supported languages.
  • General Instruction Following: Capable of handling a variety of general instruction-following tasks, including reasoning and code instructions.
  • Multilingual Data Generation: Effective for creating multilingual synthetic data by translating instructions and answers or generating instructions from seed documents.
  • Broad Language Support: Covers 22 languages, including German, Spanish, French, Italian, Korean, Dutch, Russian, English, Portuguese (Portugal), Portuguese (Brazilian), Chinese (Simplified/Traditional), Japanese, and more.

Intended Use Cases

  • Machine Translation: Ideal for high-quality translation between supported languages.
  • Multilingual Applications: Suitable for various multilingual tasks beyond direct translation.
  • Synthetic Data Generation: Useful for generating diverse multilingual datasets for training other models or applications.

For optimal performance, it is recommended to use VLLM and ensure correct prompt formatting. The model's development is detailed in the forthcoming paper "Tower+: Bridging Generality and Translation Specialization in Multilingual LLMs" (arXiv:2506.17080).

Popular Sampler Settings

Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.

temperature
top_p
top_k
frequency_penalty
presence_penalty
repetition_penalty
min_p