MBZUAI/bactrian-x-llama-7b-merged

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kLicense:mitArchitecture:Transformer0.0K Open Weights Cold

MBZUAI/bactrian-x-llama-7b-merged is a 7 billion parameter LLaMA-based model fine-tuned using low-rank adaptation (LoRA). It is trained on a multilingual instruction dataset derived from Stanford-Alpaca-52k and Databricks-Dolly-15k, translated into 52 languages. This model specializes in multilingual instruction-following, making it suitable for applications requiring understanding and generation across diverse languages.

Loading preview...

Model Overview

MBZUAI/bactrian-x-llama-7b-merged is a 7 billion parameter language model built upon the LLaMA architecture. It has been fine-tuned using Low-Rank Adaptation (LoRA) to enhance its instruction-following capabilities across multiple languages. The model's training leverages a unique multilingual dataset, created by translating the English instructions from Stanford-Alpaca-52k and Databricks-Dolly-15k into 52 different languages using Google Translation API.

Key Capabilities

  • Multilingual Instruction Following: Designed to understand and respond to instructions in 52 languages, making it versatile for global applications.
  • LoRA Fine-tuning: Utilizes efficient LoRA fine-tuning with specific hyperparameters (e.g., lora_r=64, lora_target_modules='q_proj,k_proj,v_proj,o_proj') for adaptability.
  • Replicable Training: The training methodology and code are publicly available, adapted from Alpaca-LoRA, allowing for reproducibility and further research.

Training Details

The model was trained for 10 epochs with a batch size of 128 and a cutoff length of 512 tokens. Outputs for the translated instructions were generated using gpt-3.5-turbo.

Considerations

Potential biases include translation bias and an inherent English-culture bias due to the source datasets.