yale-nlp/MDCure-Qwen2-7B-Instruct
Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:7.6BQuant:FP8Ctx Length:32kPublished:Nov 1, 2024License:apache-2.0Architecture:Transformer0.0K Open Weights Warm

The yale-nlp/MDCure-Qwen2-7B-Instruct is a 7.6 billion parameter instruction-tuned causal language model developed by Yale NLP, initialized from Qwen2-7B-Instruct. It is fine-tuned using the MDCure-72k dataset, a high-quality multi-document instruction dataset. This model is specifically optimized for multi-document processing capabilities, excelling at tasks requiring information synthesis from multiple source texts. It is designed to improve LLM performance on complex multi-document and long-context benchmarks.

Loading preview...

Overview

yale-nlp/MDCure-Qwen2-7B-Instruct is a 7.6 billion parameter model developed by Yale NLP, fine-tuned from Qwen/Qwen2-7B-Instruct. This model leverages the MDCure procedure, an effective and scalable method for generating high-quality multi-document (MD) instruction tuning data. The MDCure pipeline generates diverse MD instructions, filters them using the specialized MDCureRM evaluator model, and then fine-tunes base LLMs to enhance their multi-document capabilities. This specific model was fine-tuned on the MDCure-72k dataset.

Key Capabilities

  • Enhanced Multi-Document Processing: Significantly improves performance on tasks requiring the synthesis of information from multiple source documents.
  • Long-Context Understanding: Demonstrates improved capabilities in handling and reasoning over long-context inputs.
  • Instruction Following: Benefits from instruction tuning on a specialized dataset designed to boost MD instruction-following.
  • Scalable Data Generation: Utilizes the MDCure procedure for cost-effective generation of high-quality MD instruction data.

Good For

  • Applications requiring advanced multi-document question answering or summarization.
  • Tasks that involve processing and integrating information from several distinct text sources.
  • Use cases demanding robust performance on long-context inputs where information is distributed across multiple sections or documents.