MBZUAI/bactrian-x-llama-13b-merged

TEXT GENERATIONConcurrency Cost:1Model Size:13BQuant:FP8Ctx Length:4kPublished:Jun 19, 2023License:mitArchitecture:Transformer0.0K Open Weights Cold

MBZUAI/bactrian-x-llama-13b-merged is a 13 billion parameter LLaMA-based model developed by MBZUAI, fine-tuned using low-rank adaptation (LoRA). It was trained on a multilingual dataset derived from Stanford-Alpaca-52k and databricks-dolly-15k, translated into 52 languages. This model specializes in multilingual instruction-following, making it suitable for applications requiring understanding and generation across a broad range of languages.

Loading preview...

Overview

MBZUAI/bactrian-x-llama-13b-merged is a 13 billion parameter LLaMA-based model developed by MBZUAI, fine-tuned using Low-Rank Adaptation (LoRA). This model is distinguished by its multilingual instruction-following capabilities, achieved through training on a unique dataset.

Key Capabilities

  • Multilingual Instruction Following: Trained on a dataset of 52 languages, enabling it to understand and respond to instructions across a wide linguistic spectrum.
  • LoRA Fine-tuning: Utilizes LoRA for efficient adaptation of the LLaMA-13b base model.
  • Dataset Diversity: Incorporates translated instructions from both the Stanford-Alpaca-52k and databricks-dolly-15k datasets, with outputs generated by gpt-3.5-turbo.

Good for

  • Multilingual NLP Applications: Ideal for tasks requiring instruction-following in various languages.
  • Cross-lingual Understanding: Useful for research and development in multilingual AI.
  • Resource-efficient Fine-tuning: Demonstrates effective adaptation of large language models using LoRA.

Training Details

The model was trained for 10 epochs with a batch size of 128 and a cutoff length of 512. The LoRA configuration used an r value of 64, targeting q_proj, k_proj, v_proj, and o_proj modules. Further details and training code are available on the Bactrian-X GitHub repository.