Model Overview
MBZUAI/bactrian-x-llama-7b-merged is a 7 billion parameter language model built upon the LLaMA architecture. It has been fine-tuned using Low-Rank Adaptation (LoRA) to enhance its instruction-following capabilities across multiple languages. The model's training leverages a unique multilingual dataset, created by translating the English instructions from Stanford-Alpaca-52k and Databricks-Dolly-15k into 52 different languages using Google Translation API.
Key Capabilities
- Multilingual Instruction Following: Designed to understand and respond to instructions in 52 languages, making it versatile for global applications.
- LoRA Fine-tuning: Utilizes efficient LoRA fine-tuning with specific hyperparameters (e.g.,
lora_r=64, lora_target_modules='q_proj,k_proj,v_proj,o_proj') for adaptability. - Replicable Training: The training methodology and code are publicly available, adapted from Alpaca-LoRA, allowing for reproducibility and further research.
Training Details
The model was trained for 10 epochs with a batch size of 128 and a cutoff length of 512 tokens. Outputs for the translated instructions were generated using gpt-3.5-turbo.
Considerations
Potential biases include translation bias and an inherent English-culture bias due to the source datasets.