Yale-BIDS-Chen/Llama-3.1-8B-Evidence-Filtering
The Yale-BIDS-Chen/Llama-3.1-8B-Evidence-Filtering model is an 8 billion parameter Llama-3.1-8B-Instruct fine-tune, developed by Yale-BIDS-Chen, specifically designed for evidence relevance classification in medical Retrieval-Augmented Generation (RAG) pipelines. It determines whether a given passage contains supporting evidence for a clinical query, outputting "Yes" or "No". This model aims to improve retrieval quality and build more reliable, interpretable medical RAG systems by filtering irrelevant passages.
Loading preview...
Yale-BIDS-Chen/Llama-3.1-8B-Evidence-Filtering: Medical Evidence Classification
This model is a specialized fine-tune of the Llama-3.1-8B-Instruct architecture, developed by Yale-BIDS-Chen, with a focus on evidence relevance classification within medical Retrieval-Augmented Generation (RAG) systems. Its primary function is to act as a lightweight classifier, assessing whether a candidate passage provides supporting evidence for a given clinical query.
Key Capabilities & Features
- Medical Evidence Filtering: Classifies passages as "Yes" (contains supporting evidence) or "No" (does not contain supporting evidence) for a clinical query.
- Improved RAG Quality: Designed to enhance the reliability and interpretability of medical RAG pipelines by filtering out irrelevant information before text generation.
- Fine-tuned Performance: Achieves an F1 score of 0.623 on expert-annotated medical query-passage pairs, demonstrating substantial gains over zero-shot baselines like Llama-3.1-8B (0.521 F1) and GPT-4o (0.442 F1).
- Training Data: Trained on 3,200 expert-labeled query-passage pairs, focusing on the specific task of evidence classification.
Intended Use Cases
- Medical RAG Systems: Ideal for integration into medical question-answering systems to pre-filter retrieved documents.
- Research Purposes: Intended for researchers working on improving the accuracy and efficiency of information retrieval in clinical contexts.
- Building Interpretable AI: Contributes to more transparent RAG pipelines by explicitly identifying relevant evidence.
For detailed methodology and experimental results, refer to the associated paper: Rethinking Retrieval-Augmented Generation for Medicine: A Large-Scale, Systematic Expert Evaluation and Practical Insights.