yale-nlp/MDCure-Qwen2-1.5B-Instruct
The yale-nlp/MDCure-Qwen2-1.5B-Instruct is a 1.5 billion parameter Qwen2-based instruction-tuned language model developed by Yale NLP. It is specifically fine-tuned using the MDCure-72k dataset to enhance its multi-document processing capabilities. This model excels at tasks requiring the synthesis of information from multiple source documents, offering improved performance over base models in multi-document and long-context benchmarks.
Loading preview...
MDCure-Qwen2-1.5B-Instruct: Enhanced Multi-Document Processing
This model, developed by Yale NLP, is a 1.5 billion parameter variant of the Qwen2-Instruct family, specifically fine-tuned using the MDCure procedure. MDCure is a scalable method for generating high-quality multi-document (MD) instruction tuning data, designed to significantly improve LLMs' ability to process and synthesize information from multiple documents.
Key Capabilities & Features
- Multi-Document Instruction Following: Optimized to handle instructions that require understanding and integrating information across several distinct documents.
- MDCure-72k Dataset: Fine-tuned on the extensive MDCure-72k dataset, which complements existing instruction collections like FLAN.
- Improved Performance: Demonstrates consistent performance improvements (up to 75.5%) over pre-trained baselines and corresponding base models on a wide range of MD and long-context benchmarks.
- Context Handling: Designed to effectively process multiple source documents, recommending separation using
\n\nor<doc-sep>for optimal consistency with training data format.
When to Use This Model
- Complex Information Synthesis: Ideal for applications requiring the model to answer questions or follow instructions based on information scattered across several text passages.
- Long-Context Tasks: Suitable for scenarios where the input context involves multiple documents, pushing the boundaries of traditional single-document processing.
- Research & Development: A strong candidate for researchers and developers exploring advanced multi-document understanding and instruction-following in LLMs. The underlying MDCure methodology and datasets are also publicly available for further experimentation.