MT-LLaMA-7b-delta: Multi-task Instruction-tuned LLaMA
MT-LLaMA-7b-delta is a 7 billion parameter model developed by the MT-LLaMA team, a collaboration between Alibaba Damo Academy and the Chinese University of Hong Kong. This model is built upon the LLaMA architecture and has been extensively fine-tuned on a massive collection of tasks from the P3 (T0 Train) dataset.
Key Capabilities & Training
The model's training regimen covers a broad spectrum of NLP tasks, including:
- Question Answering: Multi-choice, Extractive, and Close-Book QA (e.g., CommonsenseQA, SQuAD, Hotpot QA)
- Classification: Sentiment and Topic Classification (e.g., IMDB, AG News)
- Generation: Structure-to-Text Generation and Text Summarization (e.g., Common Gen, CNN Daily Mail)
- Identification: Paraphrase Identification (e.g., MRPC)
Performance & Generalization
MT-LLaMA-7b-delta exhibits strong zero-shot generalization. Evaluations show significant performance improvements over the base LLaMA-7b model on both unseen datasets within trained tasks and entirely unseen tasks. For instance, on SQuAD, MT-LLaMA-7b achieves 85.9 F1 / 77.6 EM compared to LLaMA-7b's 29.4 F1 / 11.5 EM. Similarly, on the unseen COPA task, it scores 88.0% accuracy versus LLaMA-7b's 56.0%.
Intended Use
This model is suitable for applications requiring robust performance across multiple NLP tasks, particularly in zero-shot settings where it can generalize to new datasets and tasks without further fine-tuning. Developers can explore its capabilities via the provided GitHub repository.