DAMO-NLP-SG/mt-llama-7b-delta
MT-LLaMA-7b-delta is a 7 billion parameter multi-task language model developed by the MT-LLaMA team from Alibaba Damo Academy and the Chinese University of Hong Kong. It is fine-tuned from the LLaMA architecture on a diverse set of tasks from the P3 dataset, including various QA, classification, summarization, and generation tasks. This model demonstrates strong generalization capabilities across both unseen datasets within seen tasks and entirely unseen tasks, significantly outperforming the base LLaMA-7b model in zero-shot evaluations.
Loading preview...
MT-LLaMA-7b-delta: Multi-task Instruction-tuned LLaMA
MT-LLaMA-7b-delta is a 7 billion parameter model developed by the MT-LLaMA team, a collaboration between Alibaba Damo Academy and the Chinese University of Hong Kong. This model is built upon the LLaMA architecture and has been extensively fine-tuned on a massive collection of tasks from the P3 (T0 Train) dataset.
Key Capabilities & Training
The model's training regimen covers a broad spectrum of NLP tasks, including:
- Question Answering: Multi-choice, Extractive, and Close-Book QA (e.g., CommonsenseQA, SQuAD, Hotpot QA)
- Classification: Sentiment and Topic Classification (e.g., IMDB, AG News)
- Generation: Structure-to-Text Generation and Text Summarization (e.g., Common Gen, CNN Daily Mail)
- Identification: Paraphrase Identification (e.g., MRPC)
Performance & Generalization
MT-LLaMA-7b-delta exhibits strong zero-shot generalization. Evaluations show significant performance improvements over the base LLaMA-7b model on both unseen datasets within trained tasks and entirely unseen tasks. For instance, on SQuAD, MT-LLaMA-7b achieves 85.9 F1 / 77.6 EM compared to LLaMA-7b's 29.4 F1 / 11.5 EM. Similarly, on the unseen COPA task, it scores 88.0% accuracy versus LLaMA-7b's 56.0%.
Intended Use
This model is suitable for applications requiring robust performance across multiple NLP tasks, particularly in zero-shot settings where it can generalize to new datasets and tasks without further fine-tuning. Developers can explore its capabilities via the provided GitHub repository.